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Abstract. 

The Collaboratory for the Study of Earthquake Predictability (CSEP) aims to prospec- 
tively test time-dependent earthquake probability forecasts on their consistency with ob- 
servations. To compete, time-dependent seismicity models are calibrated on earthquake 
catalog data. But catalogs contain much observational uncertainty. We study the impact 
of magnitude uncertainties on rate estimates in clustering models, on their forecasts and 
on their evaluation by CSEP's consistency tests. First, we quantify magnitude uncertain- 
ties. We find that magnitude uncertainty is more heavy-tailed than a Gaussian, such as 
a double-sided exponential distribution, with scale parameter v c = 0.1 — 0.3. Second, 
we study the impact of such noise on the forecasts of a simple clustering model which 
captures the main ingredients of popular short term models. We prove that the devia- 
tions of noisy forecasts from an exact forecast are power law distributed in the tail with 
exponent a = (a^ c ) _1 , where a is the exponent of the productivity law of aftershocks. 
We further prove that the typical scale of the fluctuations remains sensitively dependent 
on the specific catalog. Third, we study how noisy forecasts are evaluated in CSEP con- 
sistency tests. Noisy forecasts are rejected more frequently than expected for a given con- 
fidence limit. The Poisson assumption of the consistency tests is inadequate for short- 
term forecast evaluations. To capture the idiosyncrasies of each model together with any 
propagating uncertainties, the forecasts need to specify the entire likelihood distribution 
of seismic rates. 



1. Introduction 

Earthquake prediction experiments such as the recently 
formed Collaboratory for the Study of Earthquake Pre- 
dictability (CSEP) [Jordan, 2006] and the Working Group 
on Regional Earthquake Likelihood Models (RELM) [Schor- 
lemmer et al, 2007] aim to investigate scientific hypotheses 
about seismicity in a systematic, rigorous and truly prospec- 
tive manner by evaluating the forecasts of models against 
observed earthquake parameters (time, location, magnitude, 
focal mechanism, etc) that are taken from earthquake cata- 
logs. After the optimism of the 1970s followed by pessimism 
on earthquake prediction [e.g. Geller, 1997], the lesson for 
the next generation of earthquake forecasters is clear: model 
formulation, calibration and hypothesis testing must be ro- 
bust, especially with respect to data quality issues. 

Paleoseismology provides a good example of the impor- 
tance of uncertainties in data analysis and hypothesis test- 
ing. Rhoades et al. [1994] showed that the hazard rate 
on the Pallett Creek segment of the San Andreas fault can 
vary by a factor of three depending on parameter estimates 
of any chosen model for the earthquake cycle, all consistent 
with the data. Davis et al. [1989] showed that parame- 
ter uncertainties and their continuous updating with time 
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lead to vastly different probability forecasts of the Parkfield 
segment. In the context of renewal processes, Sornette and 
Knopoff [1997] showed that different distributions of inter- 
event times, which may all be compatible with the data, 
have drastically diverging implications for the conditional 
waiting time until the next earthquake. Ogata [1999, 2002] 
concluded that uncertainties in the occurrence times of his- 
torical quakes make differentiating between different renewal 
distributions for the hazard on one fault inconclusive. 

CSEP and RELM and future experiments have at their 
disposal earthquake catalogs of much higher quality than 
paleoseismic studies, of course. Nevertheless, these mod- 
ern earthquake catalogs are ridden with their own obser- 
vational uncertainties, biases, arbitrary conventions and 
spatio-temporally varying quality characteristics. While 
RELM acknowledges data problems and plans to simulate 
so-called "modified observations" from the actual observa- 
tions and their error estimates in order to test the forecasts 
against these alternative, potentially equally likely scenar- 
ios, their proposed statistics and tests were not shown to be 
sufficient to solve data quality issues at the various stages 
of the hypothesis testing process, in particular with respect 
to the generation of model forecasts. 

But models for short-term forecasts are typically quite 
sensitive to recent earthquake catalog data that the models 
are calibrated on. For instance, the seismic rate forecasts in 
popular clustering and aftershock models are exponentially 
dependent on main shock magnitudes. These models include 
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the Epidemic Type Aftershock Sequence (ETAS) Model 
[Ogata, 1988; see Helmstetter et al, 2006, for an implemen- 
tation], Short Term Earthquake Probabilities (STEP) [Ger- 
stenberger et al., 2005], the model by Reasenberg and Jones 
[1989, 1994], the Epidemic Rate and State (ERS) model by 
Console et al. [2007] and the model by Kagan and Knopoff 
[1987]. Small errors in reported magnitudes can therefore 
have a large, detrimental effect on forecasts. Rather than 
entering at the evaluation phase, the magnitude uncertain- 
ties impact during the input or calibration stage - not ac- 
counted for by the current RELM/CSEP evaluation tests. 
To the best of our knowledge, no study has addressed the 
impact of data quality on seismic rate estimates, model fore- 
casts and their evaluation. 

Here, we analyze how observational errors in magnitude 
estimates propagate to seismic rate estimates and forecasts 
in common aftershock models. While other data quality 
issues are certainly important, we focus here on magnitude 
uncertainties, which seem to be the most important source of 
noise for short term clustering models because of the mod- 
els' exponential sensitivity on past magnitudes. However, 
because magnitude uncertainties are not routinely reported 
in earthquake catalogs, we first study the accuracy of mag- 
nitude estimates in section 2. In section 3, we then use a 
simple aftershock model containing the basic ingredients of 
most operational short term clustering models and inves- 
tigate the impact of the magnitude errors on seismic rate 
estimates and forecasts. Finally, we conduct a numerical 
experiment in which we mimic the RELM/CSEP testing 
center to investigate the impact of noise on the evaluation 
of forecasts generated from noisy magnitude data. 

2. Magnitude Uncertainties 

2.1. Different types of magnitudes and their errors 

Magnitude is a measurement unit originally introduced 
by Charles Richter and Beno Gutenberg during the 1930s 
at the California Institute of Technology to measure the 
"size" of earthquakes. Different magnitudes measure dif- 
ferent characteristics of seismic waves (or of earthquake- 
generated tsunamis or surface deformation), although all 
aim to quantify the "size" of an earthquake. For exam- 
ple, the original Richter magnitude (Ml) uses the maximum 
amplitude of seismic waves and the time difference between 
P and S waves recorded on a Wood- Anderson seismograph 
along with correction factors for geometric spreading and 
attenuation. Body wave magnitudes (mj) and surface wave 
magnitudes (Ms) are based on amplitudes of P and surface 
waves often near periods of Is and 20s, respectively. Both 
may include corrections for focal depth, the period used, 
the attenuation and potentially a station correction for the 
inhomogeneity of the soil. 

The moment magnitude [Kanamori, 1977; Hanks and 
Kanamori, 1979] is based on the scalar seismic moment, 
which is the magnitude of the seismic moment tensor. The 
moment tensor, in turn, is a representation of the equivalent 
body forces that would produce the same radiated seismic 
wave pattern as the observed one. Under a point source ap- 
proximation, the seismic moment is equal to the product of 
the rigidity of the material, the average displacement on the 
fault and the average fault area that slipped. The moment 
is a long period (zero frequency) measure of the total energy 
of the earthquake and in theory does not saturate. The mo- 
ment magnitude has a clear physical interpretation, while 
the relationship of other magnitudes to earthquake source 
parameters is less (if at all) established. 

Many other definitions and conventions exist for magni- 
tudes. For instance, the Advanced National Seismic System 
(ANSS) [Benz et al, 2005] reports various magnitudes, in- 
cluding local (Richter-type) magnitudes, body wave magni- 
tudes, moment magnitudes and coda duration magnitudes. 
Few earthquake catalogs homogeneously use the same mag- 
nitude type to measure earthquakes. The (Harvard) Cen- 
troid Moment Tensor (CMT) project catalog [e.g. Ekstrom 



et al., 2005] is a rare exception of a relatively uniform global 
catalog. 

In principle, each magnitude type needs to be addressed 
separately to establish uncertainties. For non-physics-based 
magnitudes (i.e. all but the moment magnitude), this can 
be particularly difficult as they are conventions by defini- 
tion and cannot be verified independently. However, even in 
these cases it is possible to establish estimates of the errors 
in the convention-based magnitude estimate. For instance, 
it is possible to analyze the effects of discretization of media 
and equations, of the measurement precision of seismome- 
ters, of the assumed velocity and attenuation models of the 
Earth (spherically symmetric, 3D, etc) and of the resolution 
of the inversion algorithm depending, e.g., on station cover- 
age. We will use the term intra-magnitude uncertainties to 
refer to such individual magnitude error estimates. These 
uncertainties measure how close a magnitude estimate may 
be to its convention-based "true" value. 

More fundamentally, the definition of an earthquake 
"event" (and hence an associated magnitude) can be ques- 
tioned. The identification of one earthquake seems inher- 
ently tied to the particular time, space and frequency resolu- 
tion of the observer. Kagan and Knopoff [1981] constructed 
a branching model which mimics a continuous deformation 
flow. When a Green's function is applied to the deformation 
and a scale imposed, the resulting seismograms seem to show 
separate events. Peng et al. [2007] analyzed the properties 
of catalogs after large earthquakes and found by handpick- 
ing events from the waveforms that many more events are 
present than detected by catalog routines, providing further 
observational evidence to the idea that the scale of the ob- 
server may determine the definition of an event. In this 
article, we will assume that listed catalog magnitudes are 
nevertheless useful for extrapolating and forecasting seismic 
rates. This is the working assumption of all seismicity-based 
earthquake forecasting experiments. 

Earthquake prediction experiments such as RELM and 
CSEP use a so-called "authorized data stream" for their 
"natural laboratories" [Schorlemmer et al, 2007]. For Cali- 
fornia, this data stream is the ANSS catalog. Models accept 
the listed magnitudes to generate forecasts of future events, 
irrespective of the type of magnitude listed. The forecast 
validation is also performed against the listed magnitudes. 
Apart from the intra-magnitude uncertainties, one should 
therefore also consider the uncertainties of one particular 
magnitude estimate in relation to the magnitude that best 
forecasts future events (if there exists such a "forecast mag- 
nitude"). For instance, the moment magnitude may be more 
relevant in predicting aftershocks than a body wave magni- 
tude: A. Helmstetter (personal communication, 2007) ob- 
served unbroken scaling of the number of aftershocks with 
moment magnitude oc 10 qMw up to Mw =9.3 (the great 
December 26 2004 Sumatra- Andaman earthquake). Physi- 
cal mechanisms for earthquake triggering might also be con- 
strained by the knowledge of the "forecast magnitude." 

But lacking this magnitude, we need to consider the un- 
certainties between the different types of magnitudes: the 
inter-magnitude uncertainties. We will study both inter- 
and intra-magnitude uncertainties to get a sense of the scale 
of the uncertainties. We then use these error estimates to 
simulate noisy magnitudes and to study their impact on seis- 
mic rate estimates and forecasts in section 3. 

2.2. Intra-Magnitude Uncertainties 

Ideally, intra-magnitude uncertainties are reported by 
earthquake catalogs based on knowledge about the seismic 
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instruments and the inversion algorithm. Unfortunately, 
such information is often lacking in catalogs. A rare ex- 
ception is provided by the Northern California Seismic Net- 
work (NCSN) catalog operated by the U.S. Geological Sur- 
vey (USGS) and the Berkeley Seismological Laboratory at 
UC Berkeley. We study the reported uncertainties in section 
2.2.2. 

A simple alternative to studying these errors indepen- 
dently is to compare the magnitude estimates for the same 
event from different networks and inversion algorithms, e.g. 
from different catalogs. While one cannot assume that one 
measure is correct and indicative of the error in the other, 
one can make some inferences, especially if the catalogs seem 
uniform and trustworthy (established, e.g., by verifying com- 
pleteness, stability and other known statistical properties). 
We therefore study the differences in moment magnitudes 
as reported by two relatively well-studied and trusted cat- 
alogs: the (Harvard) Centroid Moment Tensor (CMT) cat- 
alog [Dziewonski et al, 1981; Dziewonski and Woodhouse, 
1983; Ekstrom et al, 2005] and the USGS moment tensor 
(MT) catalog [Sipkin, 1986, 1994]. 

2.2.1. Moment Magnitude Uncertainty From the 
(Harvard) CMT and USGS MT Catalogs 

Sipkin (1986, Figure 7) compared the CMT scalar mo- 
ment tensor with the USGS-MT scalar moment tensor. He 
generally found comparable values, but CMT moments were 
smaller by a factor of two for small events and slightly larger 
for the largest events. The scatter shows deviations of 0.25 
in logarithmic moment units. Helffrich (1997) compared the 
moment tensor solutions provided by three organizations: 
the Harvard group, the USGS and the Earthquake Research 
Institute of the University of Tokyo. He found a standard 
deviation of 0.21 in log 10 moment units between the three 
data sets. He also showed that the moment estimates sys- 
tematically improved (converged) for deeper events. Kagan 
(2003) found that the differences in moment magnitude esti- 
mates of matched events reported by the Harvard CMT and 
the USGS MT catalog have a standard deviation of 0.08 
and 0.12 for deep and shallow events, respectively. If both 
catalogs contain an equal amount of noise, then the stan- 
dard deviation of each estimate is equal to 0.05 and 0.08. 
Kagan (2002) concluded that the standard deviations for 
moment magnitude estimates in California was 0.08, while 
conventional (first motion) catalogs provided less accurate 
estimates with a standard deviation equal to 0.23. Since we 
are interested in simulating noisy magnitudes to study their 
impact on seismic rate forecasts and prediction experiments, 
we update and expand these analyses of moment magnitude 
to determine the entire distribution. 

We used the Harvard CMT catalog from 1 January 1977 
until 31 July 2006, which contains 25066 events above 
Mw > 3 and wrote an algorithm to match its events with 
the USGS MT catalog from January 1980 until 31 July 
2006, which contains 4952 events above Mw > 3. Both 
catalogs are available from http://neic.usgs.gov/neis/ 
sopar/. Neither catalog contains events between 1 Decem- 
ber 2005 and 31 March 2006. But since we are not interested 
in temporal properties of the catalogs, this gap should not 
bias our results. 

We consider two listings from the two catalogs to refer 
to the same event if they are separated in time by less than 
1 minute and in space by less than 150 km. Kagan (2003) 
used the same definitions. In agreement with his findings, 
the matches are quite robust with respect to space but less 
robust with respect to the condition on time. 

Using these conditions, we match 4923 pairs of events. 
Only 29 events listed in the USGS MT catalog cannot be 
matched with events in the Harvard CMT catalog. Of these, 
5 events pass the time requirement but fail the spatial con- 
dition. Thus one might suspect extreme errors in locating 
the events. However, increasing the spatial limit up to 1000 



km does not change the results, i.e. the differences do not 
seem to be due simply to large location errors. They seem 
to be events listed in the USGS MT catalog that are entirely 
absent from the Harvard CMT catalog. Most of the other 
24 events listed in the USGS MT that could not be matched 
with events Harvard CMT seem to be events in complex af- 
tershock sequences where the identification of single events 
may sensitively depend on certain different network char- 
acteristics and on choices in the two computer algorithms. 
Vice versa, many events listed in the Harvard CMT are ab- 
sent from the USGS MT, often due to a lower detection 
threshold in the Harvard CMT catalog. 

For matched events, we calculate the moment magnitude 
Mw from the scalar moment Mq (in Newton-meter) using 
the relation M w = 2/3 log 10 (M ) - 6 [Kagan, 2003] and 
analyze the differences in Mw between Harvard CMT and 
USGS MT estimates. Figure 1 shows the distribution of the 
differences in the moment magnitude estimates from the two 
different catalogs. Figure la) shows a fixed kernel density es- 
timate [e.g. Izenman, 1991] of the probability density func- 
tion (pdf ) of the magnitude uncertainties (solid) . Figure lb) 
shows the same data in a semi-logarithmic plot. Figures lc) 
and d) show semi-logarithmic plots of the survivor function 
and cumulative distribution function, respectively, in order 
to emphasize both tails. 

We performed a maximum likelihood fit of the data to 
a Laplace distribution (a double-sided exponential distribu- 
tion), defined by: 

p e (e) = J-el— *H (1) 

where v c is the scale (e-folding) parameter indicating the 
strength of the noise and (e) is a shift parameter equal to 
the median and mean of the distribution. The maximum 
likelihood estimate of (e) = 0.006 is given by the median 
and essentially indistinguishable from zero. To estimate the 
scale parameter v c (e-folding scale), we then took absolute 
values of the deviations from the median and fit the result- 
ing positive data set to an exponential using maximum like- 
lihood. We find that u c = 0.07 for the entire data set. The 
dashed lines in all plots of Figure 1 correspond to this maxi- 
mum likelihood fit (using (e) = 0.006 and v c = 0.07). The fit 
approximates the data well in the body, but underestimates 
the tails as can be seen in Figure lb), c) and d). 

To estimate the effect of the tails which are fatter than 
exponential, we determined the scale parameter v c as a func- 
tion of the threshold above which we fit the distribution. 
We determined the median of the data (corresponding to 
the threshold "0"), took absolute values of the data and 
performed a maximum likelihood fit to an exponential dis- 
tribution. We then increased the threshold (with respect 
to the median) and re-calculated the scale parameter for 
each threshold value. Figure 2 shows the resulting maxi- 
mum likelihood estimates of v c as a function of the thresh- 
old. The error bars show estimated 95 percent confidence 
bounds. Confirming the presence of the fat tails, the esti- 
mated e-folding scale v c increases with the threshold from 
about 0.07 to about 0.1. 

The distribution of the differences between magnitude es- 
timates is not the same as the distribution of the individ- 
ual magnitude uncertainties in one estimate (see the next 
section 2.2.2 for such a direct estimate). To obtain individ- 
ual uncertainties, one can assume, for instance, that both 
uncertainties are identically and independently distributed 
(i.i.d.). In this case, the distribution of the differences can be 
assumed to be the convolution of the two individual distribu- 
tions. In the case of Gaussian distributions, the convolution 
is also a Gaussian with variance equal to the sum of the 
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individual variances. Unfortunately, the Laplace distribu- 
tion is not endowed with the same property. The difference 
between two Laplace distributed random variables is not ex- 
actly another Laplace distributed random variable. 

While we cannot determine the distributions of the in- 
dividual uncertainties exactly, we can, however, constrain 
them. For instance, they cannot be Gaussian for the above- 
mentioned property that their differences would be Gaus- 
sian. The presence of exponentially or even more slowly de- 
caying tails indicates that the individual uncertainties have 
at least exponentially decaying tails. For instance, to ob- 
tain a distribution for the difference of two random vari- 
ables which looks approximately Laplace with scale param- 
eter 0.1, we found that using Laplace distributions for the 
individual variables requires them to have an e-folding pa- 
rameter equal to 0.07. We even found that the convolution 
of power law distributions was able to produce the observed 
tails more accurately. In summary, the individual uncer- 
tainty distributions must have tails that decay as slowly as 
or more slowly than exponential. 

2.2.2. Intra-Magnitude Uncertainties Reported by 
the NCSN 

In the context of the CSEP earthquake prediction exper- 
iment, the important intra-magnitude uncertainties should 
be evaluated in California and in the regions where natu- 
ral laboratories are being established around the world (e.g. 
Europe, New Zealand, West Pacific). For California, the 
earthquake catalog that provides the data for the RELM 
and CSEP experiments is the ANSS composite catalog. 

Many regional networks feed their data into the ANSS 
composite catalog. It is essentially a computer program with 
rules for merging the data [e.g. Oppenheimer, 2007]. The 
ANSS assigns so-called "authoritative" regions to each seis- 
mic network, meaning that in those regions only data from 
its designated network is accepted into the composite cata- 
log. The Northern California Seismic Network (NCSN) ful- 
fills this role for northern California. The earthquake data 
that is passed on to the ANSS by the NCSN in turn comes 
from two sources: the Berkeley Seismological Laboratory at 
the University of California, Berkeley (UCB) and the USGS 
at Menlo Park. 

The UCB reports mainly moment magnitudes and local 
magnitudes, while the USGS reports coda duration magni- 
tudes. Their policy for merging into ANSS is: moment mag- 
nitude supercedes local magnitude supercedes coda duration 
magnitude (David Oppenheimer, 2007, personal communi- 
cation). UCB does not report uncertainties, but the moment 
magnitude is believed to be the most stable with uncer- 
tainties around 0.1, while the scatter in local magnitudes is 
strongly affected by radiation pattern (approximately 0.15 
magnitude units) and can be as large as ±0.5 (Margaret 
Hellweg, 2007, personal communication). 

The USGS, on the other hand, does provide uncertain- 
ties to the ANSS catalog, based on inversions by the Hy- 
poinverse program, written by Fred W. Klein of the USGS 
[Klein, 2002] . The Hypoinverse code "processes files of seis- 
mic station data for an earthquake (like P-wave arrival times 
and seismogram amplitudes and durations) into earthquake 
locations and magnitudes." The summary magnitude for an 
event is the weighted median of the station magnitudes. 
Each station can report a magnitude for an event if its user- 
specified weight is non-zero. The final reported magnitude 
is the value for which half of the total weights are higher 
and half lower. 

The measure of uncertainty reported by the Hypoinverse 
program, available from the NCSN in its hypoinverse for- 
mat output and from the ANSS in its "raw" format, is the 
Median Absolute Difference (MAD) between the summary 
magnitude and the magnitudes from the other reporting sta- 
tions. 

The MAD value measures the variability of the magnitude 
estimates across several stations. It therefore probes the as- 
sumed velocity, attenuation and geometrical spreading mod- 
els and the differences in station properties (e.g. frequency 



response, gain etc). Systematic biases due to other reasons 
in the magnitude inversion, however, cannot be captured by 
this measure. For instance, phase picking or instrument cal- 
ibration may be systematically biased. Furthermore, only 
one measure (the median) of the entire distribution of the 
magnitudes from the different stations calculated for one 
event is reported. If the distributions are heavy-tailed, then 
the median may give a false sense of good measurement in 
the presence of large variability. 

Nevertheless, some inference can be made about the ac- 
curacy of determined magnitudes. We collected all earth- 
quakes in the NCSN's authoritative region of the ANSS, 
defined by a polygon with the following latitude and longi- 
tude coordinates: { 34.5, -121.25, 37.2167, -118.0167, 37.75, 
-118.25, 37.75, -119.5, 39.5, -120.75, 42.0, -121.4167, 42.0, 
-122.7, 43.0, -125.0, 40.0, -125.5, 34.5, -121.25 }. We se- 
lected data from 1/1/1984 to 31/12/2006 (inclusive) above 
a magnitude threshold m t h = 3. The data with MAD val- 
ues can be retrieved from the website http: //www.ncedc . 
org/ncedc/catalog-search.html by selecting output in the 
"raw" format. This yielded 6125 earthquakes. But, as al- 
ready mentioned, only the USGS reports MAD values for 
its magnitudes, leaving 3073 events. 

In Figure 3, we show a scatter plot of the MAD versus 
their associated duration magnitudes. To test for a decrease 
of MAD with increasing magnitude, we divided the magni- 
tudes into bins of size 0.5 and calculated the mean MAD 
for each bin. In the range 3.5 < m < 4.0 (2420 events), the 
mean MAD was 0.15; for 4.0 < m < 4.5 (372 events), the 
mean was 0.16; for 4.5 < m < 5.0 (22 events), the mean was 
0.20. The bin 5.0 < m < 5.5 had mean 0.27 but counted 
only 2 events. Rather than a decrease of MAD with increas- 
ing magnitude, we see some evidence for an increase. 

Figure 4 shows the kernel density estimate of the prob- 
ability density function (pdf) of the MAD values, the cu- 
mulative distribution function (cdf) and the survivor func- 
tion. For reference, we also included the 99% confidence 
limit (the MAD value for which 99% of all reported MAD 
values are smaller). While the mean of the values was 0.15 
and the standard deviation 0.1, the 99% confidence limit 
was only reached at 0.59. That the distribution is strongly 
non-Gaussian can also be seen from the bottom panel. The 
tails decay more slowly than an exponential, indicating that 
outliers occur relatively frequently. Indeed, the maximum 
MAD value was 1.72. 

Figure 5 shows a scatter plot of the MAD values versus 
the number of stations involved in the computation of each 
coda duration magnitude and its MAD value. When the 
number of stations involved is very small, we see large scat- 
ter - very small MAD values of less than 0.1 and very large 
values above 0.5. On the other hand, as more stations are 
involved, the smallest MAD values increase to about 0.1. 
This may indicate that MAD values less than 0.1 are due to 
too few stations involved in the computation and probably 
unreliable. (At the same time, we note that a Mn = 5.32 
event with MAD 0.38 was recorded by 328 stations, suggest- 
ing that large MAD values are real.) 

When the number of stations registering an event is small, 
this presumably implies that the event is small and/or re- 
mote. Given that many events are located by less than 10 
stations, we may have detected evidence that the ANSS is 
not complete in the authoritative region of the NCSN down 
to m = 3, because even some Md ~ 5 events are constrained 
only by few stations. 

Finally, it is difficult to interpret the group of large MAD 
values reported when few stations are involved. Perhaps the 
events belong to a particular group defined by a region or 
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period with special properties that are not well modeled by 
the Hypoinverse program. 

2.3. Inter-Magnitude Uncertainties 

As was mentioned before in this article, models in earth- 
quake prediction experiments indiscriminately use whatever 
type of magnitude is listed in the authorized data stream. 
Given the random and systematic differences between the 
magnitudes, the relevant magnitude uncertainties in the 
context of forecasting are in fact inter-magnitude uncer- 
tainties. Many studies have investigated the relation of one 
magnitude scale to another and their random scatter. We 
review some of those here. We then analyze two data sets: 
(i) the differences between the CMT moment magnitudes 
and their corresponding body or surface wave magnitudes 
from the catalog provided by the Preliminary Determina- 
tion of Epicenters (PDE, 2001) (section 2.3.1); and (ii) the 
differences between duration and local magnitudes in the 
NCSN (section 2.3.2). 

Sipkin (1986, Figures 1 and 2) compared the body wave 
magnitudes mb and surface wave magnitudes M$ from the 
PDE catalog with early USGS scalar moments. Body wave 
magnitudes were scattered by up to one magnitude unit 
for the same value of the seismic moment, while surface 
wave magnitudes were less scattered but also substantial. 
Dziewonski and Woodhouse (1983, Figures 14a and 14b) 
compared the CMT moment with the PDE body wave mag- 
nitudes and found a similar amount of scatter. 

Harte and Vere-Jones (1999) compared the properties of 
the local New Zealand catalog with the data from the PDE 
catalog, made available by the National Earthquake Infor- 
mation Center (NEIC). They concluded that the differences 
between the PDE's m b and the local catalog's Ml were ran- 
dom for shallow events, but up to 1 unit of magnitude in 
size. For deeper events, m b was systematically smaller than 
the local M L . 

Kuge (1992) found a systematic bias in the body wave 
magnitude mb reported by the International Seismic Cen- 
ter (ISC) and the converted seismic moment taken from 
the Harvard CMT catalog for intermediate and deep earth- 
quakes in Japan. He computed a "theoretical" mt based on 
the CMT seismic moment and regression relationships be- 
tween the moment and NEIS (National Earthquake Infor- 
mation Service) and ISC body wave magnitudes [Giardini, 
1988]. The systematic differences were on the order of 0.2 
to 0.3 units. 

Patton (2001) investigated the transportability of the 
Nuttli magnitude scale based on 1-Hz Lg waves to differ- 
ent regions of the world. He routinely found differences of 
different types of body wave magnitudes on the order of 0.3 
magnitude units. 

Kagan (2003) found that mt to Mw conversions could 
result in scatter with a standard deviation of 0.41. He also 
concluded that converting conventional magnitudes into mo- 
ment magnitude leads to uncertainties which are three to 
four times the errors in moment magnitude estimates (0.05 
to 0.08). 

2.3.1. Moment Magnitude vs Body and Surface 
Wave Magnitude in the CMT and PDE Catalogs 

The Harvard CMT catalog calculates moment magni- 
tudes when it receives notification of a large event either 
from the NEIC via the PDE system or from the ISC. We 
compared the seismic moments of the Harvard CMT with 
the original PDE body (mj) and/or surface wave (Ms) mag- 
nitudes. That large systematic differences exist between 
these magnitudes is well-known. Here, we look at the differ- 
ences between the Harvard Mw and the PDE mt and Ms 
estimates to evaluate their scale. 

We used the global Harvard CMT catalog from 1 
January 1976 until 31 December 2005 (available from 
http://www.globalcmt.org/CMTfiles.html in gzipped "ndk" 



format). We found 24583 events listed. Of these, we se- 
lected all events that were assigned the source "PDE" (21450 
events) and converted their scalar seismic moments to mo- 
ment magnitudes using the equation Mw = 2/3 log 10 (Mo) — 
6 [Kagan, 2003]. We found 21435 m b and 13363 M s values 
which we subtracted from their corresponding Mw magni- 
tudes. Figure 6 shows the resulting differences as a function 
of the Harvard CMT Mw ■ There are systematic trends and 
random scatter. The body-wave magnitude mb systemati- 
cally underpredicts Mw for about Mw > 5.2. Since m;, is 
based on relatively short periods, the energy in these fre- 
quency bands does not increase beyond this value and the 
scale saturates. The S-wave magnitude Ms, on the other 
hand, underpredicts Mw systematically but especially for 
M w < 7. 

Figure 7 shows fixed kernel density estimates of the prob- 
ability density functions (pdfs) of the two sets of differ- 
ences. The systematic shifts of both pdfs indicate systematic 
under-estimation of Mw- The means of the data are 0.26 
for mb and 0.42 for Ms- The widths of the pdfs indicate the 
random scatter. The standard deviations are approximately 
0.29 for m b and 0.26 for M s . 

For context, an ETAS model would predict 10 times as 
many aftershocks if the magnitude unit were inflated spuri- 
ously by one magnitude unit! These differences can have a 
profound impact on global testing experiments. 
2.3.2. Duration Magnitude vs. Local Magnitude in 
the NCSN Catalog 

The NCSN catalog reports both coda duration magni- 
tude Mo and maximum amplitude (local) magnitude M L 
in its Hypoinverse output format, available from http: 
//www. ncedc.org/ncedc/catalog-search. html. We used 
data from 1 January 1984 until 31 December 2006 in the 
region 33° to 43° latitude and -120° to -115° longitude. We 
selected earthquakes larger than the magnitude threshold 
mth = 3, leaving a total of 4679 events. We found 4595 re- 
ported MAD values for duration magnitudes Md and 2711 
MAD values reported for local magnitudes Ml- 

Whenever both magnitudes were available for the same 
event, we compared their values. However, we addition- 
ally required that at least one of the two magnitudes be 
equal to or larger than M(.j = 3. Despite selecting a mag- 
nitude cut-off mth = 3 in the search algorithm on the web- 
site http: //www.ncedc . org/ncedc/ catalog-search.html, 
we found 74 events out of the total 4679 where both Md 
and Ml were smaller than the prescribed cut-off. The ex- 
treme case was an event with Md = 0.35 and Ml = —0.43 
(we presume that the cut-off magnitude corresponds to the 
magnitude reported to the ANSS, which may be different, 
see beginning of section 2.2.2). Removing these 74 events, 
we are left with 4605 events which obey the condition that 
at least one of the two magnitudes be equal to or larger 
than 3. Out of these events, we found 2733 events for which 
both Md and Ml were reported. We then calculated the 
difference A = Md — Ml for these 2733 events. 

Figure 8 shows the differences as a function of Md- Re- 
call that at least one of the two magnitudes must be larger 
than 3 (but not necessarily both). Few events are reported 
with Ml > 3 and Md < 3. On the other hand, many events 
are reported with Ml < 3 and Md > 3. It is hard to discern 
a trend visually, but Ml seems to under-predict Md up to 
about Md = 3.5, then over-predict until about Md = 5.5. 

Figure 9 shows a fixed kernel density estimate (solid) of 
the probability density function of the differences A. The 
largest difference between the two was 2.87 magnitude units 
(note that the x-axis was cut at ±1). The mean of their dif- 
ferences was —0.015, essentially showing no systematic bias. 
The standard deviation was 0.3, while the e-folding scale pa- 
rameter is 0.2. Also shown are fits of the data to a Gaussian 
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distribution (dashed; mean equal to —0.015 and standard 
deviation equal to 0.3) and to a Laplace distribution (dash- 
dotted; median equal to —0.04 and scale parameter equal to 
0.2). While neither fit approximates well the central part of 
the pdf, the Laplace distribution provides much better fits 
to the bulk and tails of the data. The Gaussian distribution 
significantly underestimates the probability of outliers. 

2.4. Summary of Magnitude Uncertainty 

Firstly, we compared estimates of the moment magnitude 
from the CMT and USGS MT catalogs. We found that the 
Laplace distribution is a good approximation to the bulk of 
the magnitude differences but it underestimates the tails. 
Our characterization of the entire distribution of the mag- 
nitude differences implies that individual uncertainties are 
distributed with tails that decay exponentially or even more 
slowly. 

Secondly, we analyzed a data set directly relevant to 
CSEP predictability experiments. We analyzed MAD val- 
ues, a magnitude uncertainty measure, reported in the 
NCSN's authoritative region in the ANSS. We found (i) 
MAD values fluctuate up to 1.71 with an average of 0.15, (ii) 
there is no evidence that magnitude uncertainty decreases 
with increased magnitude, (iii) the region may not be com- 
plete down to rrid = 3, (iv) MAD values less than 0.1 seem 
unreliable, and (v) the 99th percentile of MAD values is only 
reached at 0.59. 

We also considered inter-magnitude uncertainties. These 
can be extremely large and include systematic differences. 
We found that PDE body and surface wave magnitudes are 
systematically smaller (with mean 0.26 and 0.42, respec- 
tively) and randomly scattered (with standard deviations 
0.29 and 0.26, respectively). 

Finally, we studied the differences between NCSN's du- 
ration and local magnitudes. We found that the Laplace 
distribution again provided an adequate fit to the differ- 
ences with scale factor 0.2 so that individual uncertainties 
have exponential or fatter-than-exponential tails. 

3. Impact of Magnitude Noise on Seismic 
Rate Estimates 

In the previous section, we studied magnitude uncertain- 
ties. How do these magnitude uncertainties propagate to 
seismic rate estimates and to forecasts? How can they influ- 
ence forecast evaluation tests and the rate of type I and II er- 
rors? In particular, how could they influence the CSEP pre- 
dictability experiments? In this section, we address the first 
question of the impact of magnitude noise on the estimates 
of seismic rates in short term clustering models. We use 
the knowledge of magnitude uncertainties we have gained in 
the previous section to model to simulate magnitude noise. 
In section 4, we conduct a numerical experiment to begin 
answering the second and third question. 

3.1. A Simple Aftershock Clustering Model 

Most short term seismicity models use three statistical 
laws to extrapolate rates into the future. These are the 
Gutenberg-Richter law for magnitude-frequency statistics, 
the Omori-Utsu law for the temporal distribution of after- 
shocks and the productivity law for the expected number of 
offspring of a mother-shock. Models based on these laws 
include the Epidemic Type Aftershock Sequence (ETAS) 
Model [Ogata, 1988], Short Term Earthquake Probabilities 
(STEP) [Gerstenberger et al, 2005], the model by Reasen- 
berg and Jones (1989, 1994), the Epidemic Rate and State 
(ERS) model by Console et al. (2007) and the model by 
Kagan and Knopoff (1987). Although there are important 



differences between these models, all of them employ the 
above-mentioned three laws to forecast events. In particular, 
all of them use the so-called productivity law, defined below 
in equation (3), in which the number of expected aftershocks 
is an exponential function of the mother magnitude. 

We choose the Poisson cluster model as the basis of our 
analysis [see Daley and Vere- Jones, 2003]. It is simpler than 
the above models, yet it captures the essence of the cluster- 
ing phenomenon through the above three laws. In partic- 
ular, it preserves the exponential sensitivity of the number 
of expected aftershocks on the magnitude of the mother- 
shock. We expect that the conclusions obtained below con- 
cerning the impact the magnitude errors on the uncertainty 
of predicted seismic rates carry over to the above mentioned 
models. For models that include secondary triggering, the 
impact of magnitude noise may even be strongly amplified. 

The model can be described as follows. It consists of two 
processes: a cluster center process and an aftershock pro- 
cess. The cluster center process creates the mother events 
(also called main shocks or background) which have not been 
triggered. This process is Poissonian and governed by a ho- 
mogeneous rate A c . The magnitudes (or marks) of the clus- 
ter centers are drawn independently from the Gutenberg- 
Richter distribution [Gutenberg and Richter, 1954]: 

p m (m) =/3e- 0(m - rn «\ m>m d (2) 

where m,d is an arbitrary cut-off determined by the detection 
threshold and /3 = Mog(lO) is a constant. We denote the 
marked cluster center process by {ti c , mi c }i<i c <iv c • Each of 
these mothers can trigger direct aftershocks. In contrast to 
cascading models such as ETAS, there is no secondary trig- 
gering, allowing for a simplified analytical treatment and 
faster simulations. The number of aftershocks is a random 
number drawn from a Poisson distribution with expectation 
given by the productivity law: 

p(m ic ) = k e a(m ^- md) (3) 

where k and a are constants and rm c are the magnitudes of 
the mothers. The productivity law captures the exponential 
dependence of the number of aftershocks on the magnitude 
of the mother. This exponential dependence suggests that 
fluctuations in the magnitudes due to noise will strongly af- 
fect any forecast, as we are going to demonstrate analytically 
and by numerical simulations. 

The threshold rrid, which measures the size of the small- 
est triggering earthquake below which earthquakes do not 
trigger, is arbitrarily set to the detection threshold. This 
unjustified assumption is known to bias the clustering pa- 
rameters [Sornette and Werner, 2005a, 2005b]. 

The triggered events are distributed in time according to 
the Omori-Utsu law [Utsu et al., 1995]: 

^- U ^ (t-tl + c y W 

where c and p are constants and the ti c are the occurrence 
times of the cluster centers. The marks of the aftershocks 
are distributed in the same manner as their mothers accord- 
ing to the Gutenberg-Richter law (2) . 

In summary, the rate of events (including aftershocks) of 
the marked Poisson cluster process is completely defined by 
its conditional intensity (or rate) [see Daley and Vere- Jones, 
2003]: 

/ , , U „a(m ic -m d ) \ 

\(t,m\m,e)=p m (m) A c + J2 (t _ ti +c)P ( 5 ) 
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where Hf is the history up to time t which need only include 
information about the cluster centers {i; e , rrii c }i<i c <jv c , as 
aftershocks do not trigger their own aftershocks and do 
not influence the future. The set of parameters 8 — 
{f3,\ c ,k,a,m d ,c,p} are assumed to be known. 

The (unmarked) intensity above the detection threshold 
rrid is simply given by integrating over m: 



\(t\m,e) = \ c + 



k e a(r ' 



i) 



ic\ti <t 



(t - U c + c)p 



(6) 



In section 4, we will use a spatio-temporal Poisson cluster 
process to mimic the CSEP experiment and its evaluation as 
closely as possible. The model is defined by the conditional 
intensity at time t and location r: 



A c+ £ 



A(t, r,m\Ht) = p m (rn)x 



k e a{m ^ - md) n (0.02 • 10 a5m *c y 



ic\t ic <t 



(t - t ic + c)p (0.02 • 10°- 5m 'c + | r - r^lJi+c 



(7) 

where we added a commonly used spatial decay function [see 
e.g. Helmstetter et al, 2005]. 

3.2. Magnitude Noise 

To study the impact of magnitude noise on seismic rate 
estimates and forecasts, we assume that each (true) magni- 
tude is perturbed identically and independently by an addi- 
tive noise term e 



m ic = m ic +e ic . 



(8) 



We do not need to perturb the magnitudes of the aftershocks 
because they have no influence on the future seismicity rate. 

We use our results on magnitude uncertainties from sec- 
tion 2 by modeling noise according to a Laplace distribu- 
tion characterized by zero mean (unbiased) and a scale (e- 
folding) parameter v c : 



P«(c) 



2v c 



(9) 



We believe that this is a conservative estimate of the distri- 
bution of the magnitude noise because the Laplace distribu- 
tion under-estimates the occurrence of large error outliers. 
In general, therefore, the fluctuations of the seismic rate are 
likely to be even larger than calculated below. 

We assume the parameters are known. In particular, we 
assume below that the seismic rate estimates from noisy 
magnitudes use the same parameters as the "true" rate. We 
do so to isolate the effect of magnitude uncertainties. A 
comprehensive analysis of uncertainties including trade-offs 
between parameter and data uncertainty is beyond the scope 
of this article and is most likely extremely model-dependent. 

3.3. Fluctuations in the Seismic Rates Due to Noisy 
Magnitudes 

To investigate the fluctuations of the seismic rate esti- 
mates due to noisy magnitudes, we compare the seismic 
rates estimated from perturbed magnitudes with the true 
seismic rate. The perturbed rates from the observable, noisy 
magnitudes m° are given by: 



\ p {t\H c t ) = \ c + J2 



k e 



a(m? -m d ) 



i c \ti <t 



(t - u c + c y 



(10) 



We consider the deviation AA(t) of the perturbed rates from 
the true rate, given by: 



A\(t\HZ) = \*{t\H c t ) - \{t\Ht) 



(11) 



In the remainder of this section, we characterize the fluctu- 
ations p(AA) of the deviations from the true rate AA(i|// t c ). 
In particular, we would like to know how strong the devia- 
tions can be for a given catalog and whether one can make 
general statements for any catalog. 

To obtain some properties of the distribution of A\(t\H t ), 
we rewrite equation (11) as a finite sum over the product 
Wi ■ Zi (see Appendix Al): 



ia\ti<t 



(12) 



where 



fee 



(t-ti + cY 



(13) 



are quenched weights that depend on the true magnitudes 
and occurrence times of the cluster centers, while 



(14) 



are random variables due to the random noise e. From equa- 
tion (A7) in Appendix A2, we see that the z; are power-law 
distributed with an exponent a — \jv c a that depends in- 
versely on the size of the noise v a and the exponent of the 
productivity law a. In Figure 10, we show the theoretical 
and simulated pdf of the random variable z for several values 
of the noise level v c . 

Equation (12) together with the knowledge that the ran- 
dom variables z;'s are power law distributed implies that the 
following proposition is true: 

Proposition 1: The deviations A\(t\Ht) of the perturbed 
seismic rates from the true seismic rate due to magnitude 
noise converge to bounded random variables with a distri- 
bution having a power law tail for A A — > co: 



p(AA) 



(AA)!+° 
and with a scale factor given by 



with a 



(15) 



(16) 



where the sum is over the iV c earthquakes in the catalog that 
occurred before the present time t and the Wi's are given by 
(13). 

Proof: See Appendix A3. 
Remarks: 

1. The deviations of a rate estimate based on noisy mag- 
nitudes from the exact rate are power-law distributed in the 
tail. Estimates of a seismic rate may therefore be wildly 
different from the "true" rate, simply because of magnitude 
uncertainties. 
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2. The exponent a determines how broadly the seis- 
mic rate estimates are distributed around the "true" rate. 
The exponent sensitively depends on the noise level v c and 
the productivity exponent a. For the often quoted value 
a = ln(10) = 2.3 [e.g. Felzer et al, 2004] and for v c = 
(0.1, 0.2, 0.3, 0.5), one obtains a = (4.34, 2.17, 1.45, 0.87), re- 
spectively. Even for relatively small levels of noise v c > 0.22, 
the variance of AA does not exist (a < 2), while for 
v a > 0.44, the average of AA does not exist (a < 1)! 

3. The power law tail does not depend on the specific 
history of the magnitudes and occurrence times. The same 
power law tail holds for any catalog. On the other hand, the 
scale factor Caa depends sensitively on the specific realiza- 
tion via the magnitudes and occurrence times (see Proposi- 
tion 2). 

4. Short term earthquake forecasts are directly (some- 
times indirectly) taken from such seismic rate estimates. 
The accuracy of the forecasts cannot be better than the 
seismic rate estimates. 

As a demonstration of the strong fluctuations, Figure 11 
shows simulations of the seismic rate deviations for various 
levels of noise v c = (0.1, 0.2, 0.3, 0.5). In each case, we simu- 
late N c = 100 cluster centers according to a Poisson process 
and randomly draw their magnitudes from the GR law. We 
then choose a time lag dt (days) between the last event in 
the cluster center process and the time at which we want 
to evaluate (forecast) the seismic rate. We calculate the 
"true" seismic rate from equation (6), using the parameter 
set 9 = {/3 = 2.3, X c = l,k = 0.01, a = 2.3, m d = 3,c = 
0.001, p = 1.2} and using the "true" simulated magnitudes. 
Next, we generate "perturbed" catalogs by adding noise to 
the simulated magnitudes 100, 000 times according to the 
double exponential noise defined in (1). For each perturbed 
catalog, we recalculate the "perturbed" seismic rate as be- 
fore using equation (6) but replacing the "true" magnitudes 
by noisy magnitudes. 

Figure 11 shows fixed kernel density estimates of the 
normalized differences between the perturbed and the true 
rates. In each panel, the deviations are shown for four 
different noise levels v c = (0.1,0.2,0.3,0.5). The different 
panels (top to bottom) are different choices of the time lag 
dt = {0.0001, 0.1, 1, 10} days since the last event in the pro- 
cess at which the rates are calculated. Figures 12 and 13 
show double logarithmic plots of the survivor functions for 
the two extreme cases dt — 0.0001 and dt = 10, respectively. 

The seismic rate estimates are very broadly distributed, 
but with different dependence on the specific catalog, de- 
pending on the time horizon dt. For small dt (smaller than 
the average inter-event time 1/A C between successive main 
shocks), the only relevant event that determines the distri- 
bution of the perturbed rates is the last one. Therefore, 
the (normalized) distributions for small dt are almost iden- 
tical to the distribution of the random variable z (up to 
the weight pre-factor associated with the last event). How- 
ever, as dt increases to become comparable to and larger 
than 1/A C , the rate is no longer dominated solely by the 
last event. Previous events can become increasingly signif- 
icant in determining the sum compared to the last event. 
Consider two earthquakes of similar magnitudes occurring 
at ti — and ti = 10 respectively. At time £ = 11, t — ti = \ 
and t — ti = 11 so that earthquake 2 will dominate at early 
times by the effect of the Omori law on the weights, since 
w 2 /w 1 {t = 11) = (11 + c) p /(l + c) p ~ 11. But at t = 20, 
say, the weight wi of the first earthquake has decreased only 
by a factor 2 P while the weight u>2 of the second earth- 
quake has decreased by a factor 10 p , so that the ratio is 
now io 2 /wi(i = 20) = (20 + c) p /(10 + c) p ~ 2. Since the 
weights are so strongly stochastic, each particular catalog re- 
alization will have a different number of events with vastly 
different weights that are "felt" by the rate as dt increases. 

We give some concrete numbers for illustration for two 
cases, dt = 0.0001 (Figure 12) and dt = 10 (Figure 13). 



With our parameter choices c = 0.001 and A c = 1, this 
ensures that dt — 0.0001 corresponds to the regime domi- 
nated by the last event in the catalog while the case dt — 10 
will show the impact of many past events. For dt = 0.0001 
and realistic u c = 0.2, 80% of the seismic rate estimates 
(and hence forecasts) deviate by more than 10%; almost 
two thirds are off by more than 20%; and almost one third 
are off by more than 50%. Even fluctuations of 100% occur 
11% of the time. For larger levels of noise, much stronger 
fluctuations can occur. For instance, 28% of rate estimates 
are off by more than 100% for v c — 0.5. For the case where 
seismic rate estimates (forecasts) are made for dt = 10 af- 
ter the last event, the percentages depend strongly on the 
particular cluster center realization. 

Apart from the dependence on a particular realization, 
there is another consequence of the increasing importance of 
previous events with time since the last event. As previous 
events are summed over, the distribution of the sum tends 
towards its stable law (either Gaussian or Levy). For a > 2 
{v c < 0.22), the central limit theorem applies to the body 
of the distribution (though not the tails, see e.g. Sornette, 
2004) and tends to organize the distribution of the devia- 
tions towards a Gaussian. At the same time, only a finite 
number of terms effectively control the rate so that there is 
no asymptotic convergence. Nevertheless, there may be a 
tendency for the body of the distribution to become Gaus- 
sian. For < a < 2 (y c > 0.22), in contrast, the distribution 
will tend towards a Levy law with a power law tail exponent 
equal to a, keeping intact the original power law tail. 

This discussion can be summarized by the following 
proposition, which emphasizes that the scale factor Caa is 
"non-self-averaging" [Mezard et al, 1987]. 

Proposition 2: 

1. For pa = -2- > 1, the scale factor Caa = J2? c w i 

given in Proposition 1 (or the typical scale C^") of the dis- 
tribution of the deviations of the perturbed rates from the 
true rate converges to a finite value as N c — > co. 

2. For pa = ^ < 1, the scale factor Caa = J2f c w" 
diverges — + oo as — > co. This means that the deviations 
A\(t\Ht) of the perturbed rates from the true rate diverge as 
the duration of the catalog used to calculate them increases 
without bound. 

3. In the regime pa > 1 for which Caa converges almost 
surely to a finite value as N c — > oo, Caa remains a ran- 
dom variable sensitively dependent on the specific catalog, 
Caa being distributed according to a non-degenerate distri- 
bution. In symbols, 

Caa (Caa) as N c -> oo, (17) 

where (Caa) denotes the average of Caa over many dif- 
ferent catalogs. This means that the value Caa is catalog 
specific and changes from catalog to catalog and from real- 
ization to realization. This property is known as "lack of 
self-averaging" [Mezard et al., 1987]. 

Proof: See Appendix A4. 

The divergence of the scale factor Caa and of the devia- 
tions AX(t\Ht) described by item 2 of Proposition 2 occurs 
only for rather large magnitude errors. For instance, for 
p = 1.2 and a — 2.3, the noise level needs to be larger than 
v c = 0.52 for pa = < 1 to hold. According to our pre- 
vious analysis in Section 2, we can expect v c to lie in the 
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range 0.1 — 0.3 typically, so that the regime pa = -2— > 1 
in Proposition 2 is likely to be the most relevant one. 

In summary, for large dt, there is a trade-off between some 
averaging for small noise levels (i.e. a tendency towards a 
Gaussian shape), the dependence on the specific catalog re- 
alization (in particular, the last few events) and the power 
law tail proven in Proposition 1. These effects can be seen 
by comparing the survivor functions in Figure 12, which for 
dt — 0.001 are pure power laws for AA > 0, with Figure 13 
for dt = 10. For v c — 0.3 and u c — 0.5 (Levy regime), the 
power law remains largely intact even for the body of the 
distribution. For v c = 0.1 and v c = 0.2, the body is not well 
approximated by the power law, while the tails still show 
significantly larger outliers than expected from a Gaussian. 

We now return to the explicit values and percentages of 
the deviations for the case dt — 10. For realistic v c = 0.2, 
about 70% are off by more than 10%; 45% — 55% deviate 
by more than 20%; roughly 20% of seismic rate estimates 
over-predict the true value by more than 50%. 

In contrast to the noise levels for which a > 2, the fluctu- 
ations actually increase for v c = 0.3 and v c = 0.5 compared 
to the case dt = 0.0001. For v c = 0.5, 90% - 95% are off 
by more than 10%; 80% - 90% are off by more than 20%; 
70% — 75% are off by more than 50%; and roughly one half 
are off by more than 100%! 

To conclude this section, we restate the main results: (i) 
Seismic rates estimated from noisy data deviate strongly 
from their "true" value; (ii) If magnitude uncertainties are 
exponentially distributed (or more broadly), as we have 
shown in section 2, then these deviations are power law 
distributed in the tail with an exponent a = l/av c ; (iii) 
We believe there is no law of large numbers that can ad- 
equately describe the entire distribution of the deviations. 
However, the scale factor (or the typical size of the devia- 
tions) can be shown to diverge almost surely if pa < 1 and 
to be distributed extremely broadly if pa > 1. These results 
demonstrated rigorously have been illustrated with numeri- 
cal simulations. The next section investigates the impact of 
the propagating uncertainties on the test scores that fore- 
casts receive in earthquake predictability experiments such 
as CSEP. 

4. Impact of Magnitude Uncertainties on 
Model Forecasts and Their Evaluation in 
Consistency Tests 

Seismic rates estimated via some probabilistic model at 
time t in the future as a function of past seismicity are rou- 
tinely used as direct estimates of a forecast (the expected 
number of events at time t) [e.g. Helmstetter et al, 2005; 
Gerstenberger et al, 2005]. In rare cases, forecasts are gener- 
ated by averaging over Monte Carlo simulations which nev- 
ertheless remain parametrized by the estimated seismic rate. 
Forecasts are therefore either equal to, proportional to or 
strongly correlated with seismic rate estimates. As a con- 
sequence, they suffer from the same fluctuations as seismic 
rate estimates. 

How important are these spurious fluctuations in the eval- 
uation of forecasts in prediction experiments? Can good 
forecasts perform poorly solely due to magnitude noise? Can 
accurate models be rejected by test scores because noisy 
magnitudes influenced the rate estimation? How sensitive 
are evaluation procedures to this source of noise? 

We address these questions by designing the following nu- 
merical experiment: We test forecasts based on noisy mag- 
nitudes against a hypothetical "reality" , chosen as the rate 
based on exact magnitudes, to see whether the noisy fore- 
casts are rejected. We pretend that the exact seismic rate 
as calculated from the model equation (7) in each space- 
time bin is "reality" according to which earthquakes actu- 
ally occur. Since the exact rate is not an integer, we assume 
that observations are drawn from a Poisson distribution with 



mean equal to the seismic rate in each bin. We further call 
the noisy forecasts "models" (j = 1 . . . M). We then con- 
structed a miniature CSEP testing center which tests these 
"models" against "reality" according to their consistency 
with the "observations" in a manner entirely equivalent to 
the proposed scenario [Schorlemmer et al., 2007; Schorlem- 
mer and Gerstenberger, 2007]. 

The testing center uses the likelihood (L) and number (N) 
tests as consistency criteria for evaluating forecasts. These 
are currently used for the five year time-independent fore- 
casts which have already been submitted [Schorlemmer et 
al, 2007]. The test were used in previous studies of fore- 
casting [Kagan and Jackson, 1994, 1995] and were further 
explained by Jackson [1996]. 

We calibrated the numerical experiment to mimic Califor- 
nia as much as possible to create realistic conditions under 
which actual forecast evaluation will take place. This meant 
choosing a realistic model, realistic model parameters and 
realistic space-time bins. 

If we find that "models" (i.e. forecasts based on noisy 
data) are rejected more frequently by the consistency tests 
than according to the chosen confidence limit, we can draw 
the following conclusions: (i) the "true" model with "true" 
parameters may be rejected in a realistic one year test period 
because noisy magnitude observations affected its forecasts, 
(ii) the outcomes of the likelihood and number consistency 
tests are therefore sensitive to observational uncertainties 
that affected the generation of the forecasts. 

First we describe the simulations and their evaluation by 
likelihood methods before presenting our results. Our nu- 
merical experiment can be separated into two steps: (i) the 
simulation of the exact seismic rates in each space-time bin 
and the corresponding noisy forecasts, and (ii) the evalua- 
tion of the noisy forecasts with respect to the exact forecast 
using likelihood tests. Figure 14 graphically explains our 
numerical experiment and should be used for reference. 

4.1. Simulation of Exact Seismic Rates and Noisy 
Forecasts 

The simulation of the exact seismic rates and the noisy 
forecasts proceed according to the following steps. 

1. Choose a model, its parameters and the test 
area: The impact of magnitude uncertainties on forecasts 
will depend on the specific model and potentially its param- 
eters. We chose a spatio-temporal Poisson cluster model 
defined in equation (7) to capture the main ingredients of 
the popular short-term forecasting models. We used pa- 
rameters and a spatial test area consistent with Californian 
earthquake data to create realistic conditions (see section 
4.3 below for parameter values). 

2. Simulate a "learning" catalog: Using a set of pa- 
rameters and the model (7), we simulate a "learning" cat- 
alog from which forecasts are to be generated. Because af- 
tershocks of the cluster centers do not trigger their own af- 
tershocks, they do not affect future seismicity and do not 
need to be simulated for the "learning" catalog. Only the 
Poisson cluster center process needs to be simulated. It is 
a homogeneous space-time Poisson process of constant rate 
A c (per unit time and area). Independent magnitudes are 
drawn from the Gutenberg- Richter law (2). See Figure 15 
for an example of a simulated cluster center process. 

3. Calculate the exact daily seismic rate over one 
year in each space-time bin i: Divide the test area into 
spatial cells (bins). Use equation (7) and the original pa- 
rameters to calculate the intensity \(t,r\H£) for every day 
over the course of one year at the centers of the spatial cells. 
Past seismicity (times, locations and magnitudes) enter into 
equation (7). The rate in each space-time bin is calculated 
by multiplying the rate at the center by the area of the spa- 
tial cell for simplicity. Pretend these rates are "reality" : the 
number of actually observed events is drawn from a Poisson 
distribution with mean equal to this rate (see below). 
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4. Create M noisy replicas of the "learning" cat- 
alog by perturbing the magnitudes: Generate j — 
1 . . . M different noisy replicas of the original catalog by 
adding random noise to each magnitude. The random noise 
is simulated from the Laplace distribution defined in (1) 
characterized by the scale parameter v c . 

5. Generate M "noisy" forecasts based on the M 
noisy catalogs: Using equation (7) and the original param- 
eters, calculate the daily seismic rate for the whole year in 
the same manner as the exact rate but using the noisy cat- 
alogs. Denote these noisy forecasts as "models" j = 1 . . . M 
which we want to compare to the exact seismic rate in a 
mock CSEP/RELM testing center. 

4.2. Evaluation of the Noisy Forecasts in a Hypothetical 
RELM/CSEP Testing Center 

In the evaluation of the "models" (or noisy forecasts), we 
follow the recipe of the RELM/CSEP testing procedures, as 
described below. But first we address an important issue 
regarding our choice not to use magnitude bins. 

When we simulate the learning catalog above, we must 
choose a magnitude threshold. We set nid = 4 as for the pro- 
posed daily forecast competition. Magnitudes for the simu- 
lated cluster center events were drawn from the Gutenberg- 
Richter distribution with rrid = 4. We then perturbed this 
catalog according to a symmetric Laplacian distribution. As 
a result, some of the simulated cluster centers now have 
magnitudes that are smaller than the magnitude thresh- 
old md = 4. We note that their influence is very small 
in the calculation of the noisy seismic rates (forecasts) with 
respect to the events above the threshold because of the 
exponential dependence of the productivity on the differ- 
ence exp(a(m» c — ma))- Rather than re-applying the cut-off 
ma = 4 after the perturbation, we kept the number of events 
in the learning catalog fixed. We therefore also did not take 
into account the possibility of magnitudes originally below 
the threshold becoming "visible" above the threshold due to 
the addition of noise. These more realistic conditions should 
be pursued as a next step. In our scenario, the mean mag- 
nitude before and after the perturbation remains constant. 
Any results in the performance of the forecasts are therefore 
not due to an overall shift in the magnitude distribution. 

As a consequence, we did not introduce magnitude bins at 
the testing stage. Rather, we assumed that the seismic rates 
and the forecasts were for one and the same large magnitude 
bin. This also meant that we did not need to generate mag- 
nitudes of the aftershocks of the cluster centers, neither of 
the exact catalog nor of the noisy catalogs. This helped the 
numerical efficiency of the computer program. Furthermore, 
by not testing the magnitude values, we effectively assumed 
that all simulated magnitudes were completely consistent 
with the "observed" ones, as the sum over the likelihood val- 
ues of the magnitude bins equals 1. In other words, we are 
less stringent than the CSEP/RELM center by not rejecting 
models because of under- or over-predicted magnitudes. 

The CSEP/RELM testing procedure acknowledges the 
existence of observational uncertainties in the observed 
earthquake parameters such as magnitude, location or fo- 
cal parameters. The test center therefore creates so-called 
"modified" observations from the actually observed earth- 
quake parameters by sampling parameters that are consis- 
tent with the observed ones and their uncertainties. In this 
way, many alternative "realities" or observations are cre- 
ated that are consistent with the actual observations and 
their uncertainties. The forecasts are tested against each 
alternative reality and the final rejection is based on the 
average performance of the forecast against all alternative 
"realities" . 

In our hypothetical center, we should therefore create 
many "modified" observations consistent with the actual 
observation in the same manner. However, as just stated 



above, we did not actually generate magnitudes of the events 
and do not test a forecast's ability to predict magnitudes. 
The magnitude dimension is effectively eliminated from the 
evaluation phase by assuming that all simulated magnitudes 
are consistent with the "observed" ones. 

At the same time, we did generate "modified" observa- 
tions in each space-time bin according to the following logic. 
The CSEP/RELM test center generates so-called "simu- 
lated" observations from a model's forecast that are consis- 
tent with the forecast by drawing numbers from a Poisson 
distribution with mean equal to the forecast in each space- 
time bin. This assumes that earthquakes are independent 
and occur according to a Poisson process in each space-time 
bin. This is not true for the Earth. However, we decided 
to assume that our "reality" did indeed occur according to 
a Poisson process in each bin so that we create favorable 
conditions for the test to work and to be sure that "models" 
were not rejected because they violate this assumption of 
the test. 

We therefore created many alternative realizations of ob- 
served earthquakes that are consistent with the exact seis- 
mic rate in each bin according to a Poisson distribution 
with mean equal to the seismic rate. In analogy with the 
RELM/CSEP terminology, we called these alternative "real- 
ities" modified observations. For the Earth, these favorable 
conditions do not hold and may also influence whether a 
model is rejected. 

With reference to Figure 14, the various steps of the 
RELM/CSEP testing procedure are described below. We 
follow the notation of Schorlemmer et al. [2007]. 

1. For each of the {j — 1...M} "models", gen- 
erate {k — l...m} "simulated" observations: The 
RELM/CSEP test center assumes that earthquakes are in- 
dependent in each space-time bin i and generates Poisson 
distributed random variables w J k i with mean equal to the 
model's forecast \\ in that bin. This is repeated m-times 
to generate {k = 1 . . . m} sets of simulated observations for 
each model. 

2. For each of the M "models", calculate the like- 
lihoods of the m simulated observations: Assume that 
the "model" is true and calculate the likelihood of each of 
the {k = 1 . . . m} sets of observations generated from its 
forecast A^ and sum over all space-time bins. These are the 
simulated likelihood values {L 3 k , k = 1 . . . m} consistent with 
a particular model j which will be tested against the likeli- 
hood of the observations below. The simulated likelihood of 
the jth model of the kth observation set is given by: 
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3. Generate {q — 1 . . . s} "modified" observations 
("alternative realities") from the exact seismic rate: 

In each space-time bin i, generate a "modified" observation 
wl by drawing a random variable from a Poisson distribu- 
tion with mean equal to the exact seismic rate in that bin. 
Repeat s-times to generate s sets of "modified observations" 
against each of which each "model" is tested. 

4. For each of the M "models", calculate the likeli- 
hoods of the s "modified" observations: Assume that 
the jth "model" is true and calculate the likelihood U q of 
observing each of the {q = 1 . . . s} "modified" observations 
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5. For each of the M "models"^ calculate the frac- 
tion 7^ of simulated likelihoods L k less than each of 
the s observed likelihoods L 3 q : For each model j and 
each set q of modified observations, compute the fraction 
7q of simulated likelihoods {L k ,k = 1 . . . m} less than the 
observed likelihood L q . Repeat for each of the q = 1 . . . s ob- 
served likelihoods for each of the M "models" . The fraction 
7^ measures whether an observed likelihood value is consis- 
tent with the values expected from a particular "model" . 

6. For each of the M "models", calculate the mean 
fraction (j 3 ) from the s values of j q : Because many 
"modified" observations are consistent with our chosen "re- 
ality" , we average the fraction 7 J over all sets of "modified" 
observations for each model (noisy forecast). The mean frac- 
tion (7 J ) measures whether the observed likelihood values 
fall within the center of the simulated likelihoods given a 
particular model. 

7. For each of the M "models", calculate the sim- 
ulated total number of events N 3 k from each of the 
{k — 1 . . . m} simulated observations: Generate the dis- 
tribution of the simulated total number of events in the test 
region consistent with each model. 

8. Calculate the "modified" total number of 
events N q from each of the {q = 1 . . . s} modified ob- 
servations: For each of the possible "realities" or modified 
observations, sum the modified observations over all bins to 
obtain the modified total number of events N q . 

9. For each of the M "models", calculate the frac- 
tion S q of simulated total numbers of events N 3 k less 
than the modified number of events N q for each of 
the {g = 1 . . . s} modified observations: Compute the 
fraction S q of simulated numbers of events {N k , k = 1 . . . m} 
less than the qth modified total number of events N q for each 
modified observation and each model. The fraction 8 3 q mea- 
sures whether the simulated numbers of events are consistent 
with the observed number. 

10. For each of the M "models", calculate the 
mean fraction (S 3 ) from the s values of S q : Again, 
because many modified observations are consistent with the 
actual observation, average the fraction S q over all of its s 
values for each model (noisy forecast). The mean fraction 
(8 3 } measures whether the modified observations are on av- 
erage consistent with the simulations. 

11. Perform L test: Reject model j if (-y J ) < 0.05. 
This indicates that the observed likelihood values are incon- 
sistent with the model. According to Schorlemmer et al. 
(2007), the L test is one-sided. 

12. Perform N test: Reject model j if (S j ) < 0.05 or 
if (6 3 } > 0.95. This indicates that the observed number of 
events are inconsistent with the model. 



4.3. Test Area and Model Parameter Choices 

To simulate catalogs and generate exact forecasts, we 
needed to choose the spatial test area, the number of spatial 
bins, the parameters of the model and the temporal bins (for 
which forecasts are issued and evaluated) . To mimic a daily 
forecast competition in California in an earthquake predic- 
tion experiment such as RELM, we chose temporal bins of 
one day and issued forecasts over the course of an entire year 
always using all information available up to the day of the 
forecast. We assumed a square spatial area of 700 km by 
700 km to approximate the size of California. 

To generate synthetic catalogs and forecasts, we needed 
to decide on a set of parameters 6 = {f3, m d , A c , k, a, c, p, n} 
for the model (7). We decided on ma — 4 to mimic the 
proposed RELM/CSEP daily forecast competition and on 
P = 2.3. Inverting the parameters of the Poisson cluster 



process requires knowledge of the entire branching struc- 
ture (being able to identify which events are cluster cen- 
ters and which are their dependent aftershocks). Lacking 
such knowledge, the maximization of the likelihood must 
be performed over all possible branching structures. Given 
the complexity, we decided instead to use parameter values 
based on an ETAS model inversion for southern California 
of Helmstetter et al. [2006, model 2 in Table 1] and ad- 
justed them appropriately for our model, magnitude thresh- 
old and spatial test area. Their background rate was multi- 
plied by 10 - to adjust to the higher magnitude threshold, 
and multiplied by 3 for the increased spatial area, result- 
ing in A c = 0.063. We also multiplied k by 10~ to ob- 
tain k = 0.013. We used their other parameters without 
change {a = 0.43 x log(10), c = 0.0035, p = 1.19,// = 2}. We 
checked that the simulated total number of expected events 
per year over the entire region given these parameter choices 
was comparable to yearly rates above M4 in California over 
the last twenty years (from about 50 up to 250) . 

Because of computational limitations, we had to restrict 
the number of spatial bins to 30 by 30 and the number of 
perturbations of the original catalog to M=10. However, we 
were still able to generate m = 1000 simulated observations 
for each noisy forecast and s — 1000 modified observations 
for the exact forecast. 

In this article, we aim to establish only whether examples 
exist in which good models are rejected based solely on real- 
istic magnitude uncertainties. A more in-depth study which 
explores the model, parameter and bin space is certainly re- 
quired to establish robust confidence limits for testing the 
consistency of model forecasts with observations. 

4.4. Simulations and Results 

Figure 15 shows an example of a simulated cluster center 
catalog: the top panel shows the spatial distribution of the 
cluster centers in the spatial test area 700 km by 700 km. 
The middle panel shows the magnitudes of the cluster cen- 
ters against time in days over the course of one year. The 
bottom panel shows the rate for both cluster centers and 
their aftershocks calculated from the model (7), summed 
over all spatial bins. 

We checked that setting the noise level to zero v a = 0.0 
does not result in any "models" being rejected. Table 1 
shows the results of the mock RELM/CSEP evaluation of 
the daily forecasts of 10 unperturbed forecasts over the pe- 
riod of one year. None of the "models" are rejected by the L 
and N tests. Apart from the mean fractions (6 3 ) and (7 J ), we 
also calculated their standard deviations a 1 and as. Table 
1 shows both fractions to be right in the middle of the sim- 
ulated distributions, indicating strong consistency between 
the models and the observations, as should be expected. 

Introducing just a little bit of noise changes the situation. 
For v c = 0.1, we found that the N test rejects 2 models at 
the 90 % confidence limit (see Table 2). We found that the 
difference between the simulated total number of expected 
events of a model and the actual expected number based on 
the exact rate was a good indicator for the model's perfor- 
mance in the N test, as should be expected. In keeping with 
the statement by Schorlemmer et al. [2007] that the L test 
is one-sided, we do not reject models for which (y 3 ) > 0.95. 
In this case, the (two-sided) L test would have rejected the 
same models as the N test. Because rejecting 2 models out 
of 10 at 90% confidence may simply be the expected false 
negative errors, we additionally performed simulations for 
50 models. Of these, 8 models were rejected by the N test, 
none by the L test. Thus in total, 10 models out of 60 mod- 
els were rejected, indicating slightly higher than expected 
rejections at 90% confidence. 

Table 3 shows the results for simulations with the noise 
level set to v c = 0.2. The cluster center process of this 
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simulation is shown in Figure 15 along with the calculated 
seismic rate according to model (6). The N test rejects 9 
out of 10 models because they over-predict the number of 
observed events. A two-sided L test would have rejected 3 
models, but the one-sided L test does not reject any models. 
In contrast to the case for v c = 0.1, the fluctuations strongly 
impact the forecasts. 

Results for simulations with noise level v c = 0.3 are sum- 
marized in Table 4. Note that the values of (7 3 ) are now 
fluctuating very strongly. The one-sided L test now rejects 
one model while the N test rejects 7 models. Clearly, the 
noisy forecasts are no longer consistent with the observed 
likelihood values and numbers of events. Note that model 
5 predicts more than 10 times the actually expected num- 
ber of events, reflecting the extreme fluctuations we proved 
in the previous sections. It is also interesting that the L 
test rejects model 2, which passes the N test. This shows 
that the daily expected numbers can fluctuate but in a sense 
"average out" over the course of one year, but the likelihood 
scores of each day keep a "memory" of the bad predictions. 
This exemplifies the complementary properties of the two 
tests. 

The case v c — 0.5 is shown in Table 5. All models are 
rejected by the N test, indicating systematically larger fore- 
casts. 

These simulations show that, as u c increases, the models 
tend to forecast numbers of events that are larger than the 
true value: the larger v c , the larger the effect, and therefore 
the more probable the rejection by the N test. This results 
from the fact that, while the distribution (1) of magnitude 
errors is found (and assumed in our simulations) to be ap- 
proximately symmetric, the impact of a magnitude error is 
strongly asymmetric when comparing negative and positive 
deviations from the true magnitude, due to the exponential 
dependence of the productivity law (3). 

Because the tests rejected more models for v c = 0.2 than 
for v c — 0.3, we believe that we are not sampling the ac- 
tual fluctuations with 10 models. More models (perturbed 
catalogs) are needed to characterize precisely how the con- 
fidence limits of the tests are affected. In this article, we 
showed simply that the stated confidence limits as we would 
like to interpret them (that the model is inconsistent with 
90 % confidence) are clearly not adequate. 

4.5. Discussion of Mock RELM/CSEP Predictability 
Experiment 

The results from the mock RELM/CSEP evaluation of 
forecasts generated from noisy data against the exact seis- 
mic rate based on exact data indicate that noisy fore- 
casts fluctuate wildly and may therefore be rejected by the 
RELM/CSEP consistency tests. Even conservative levels of 
magnitude noise of v c — 0.2 impact forecasts so strongly 
that they are easily rejected by the likelihood and number 
tests. In other words, the L and N tests are sensitive to ob- 
servational uncertainties that entered in the creation of the 
forecasts. As a consequence, when considering actual results 
from L and N tests based on comparing a real model with 
real data, one should keep in mind the possibility that the 
forecast contains noise which may influence severely the per- 
formance of a short-term model. The supposed confidence 
limits may be misleading as they do not take into account 
uncertainties in the forecast. Some models may be rejected 
purely because of forecasts generated from noisy earthquake 
catalogs, while others may appear to be consistent with the 
data (more often than expected given the RELM/CSEP con- 
fidence limits). 

We do not expect these "wrongful" rejections to stop if 
the tests are performed over a longer period of time. Each 
day is separately scored according to forecast and observa- 
tions and the daily forecast will always be strongly fluctu- 
ating. Extending the evaluation period to two years, for 
instance, would not solve the problem. 

We emphasize that we have completely isolated the effect 
of magnitude uncertainties and assumed everything else to 



be known. We have shown that, in this scenario, magnitude 
uncertainties lead to strongly fluctuating forecasts. While 
a comprehensive study of uncertainty in data, parameters, 
forecasts, observations and their trade-offs should be en- 
couraged, we expect that there will be no simple formula 
to "correct" the forecasts, tests or interpretations. 

Rather, the propagation of data and parameter uncertain- 
ties needs to be carefully examined in each specific model 
and accounted for in the forecasts. The resulting distribu- 
tion of forecasts can most likely not be captured by one value 
such as the expectation. In fact, we have shown above that, 
depending on the noise, we should expect extremely large 
variations that cannot be represented by one number per 
bin. 

The L and N tests both assume that earthquakes are 
independent in each space-time bin and that observations 
consistent with the forecast are Poisson distributed random 
variables. In this article, we did not study the implications 
of the first assumption. It would seem, however, that the 
assumption of independence will be strongly violated dur- 
ing active aftershock sequences - the days when clustering 
models can actually be tested on observed clustering. 

But from the standpoint of our results, the second as- 
sumption (that observations consistent with a model are 
Poisson distributed with mean equal to the forecast) may 
need to be relaxed because of uncertainties in the forecasts, 
whether due to noisy magnitudes used to generate forecasts, 
parameter uncertainties or other sources. Instead, models 
will need to provide the entire likelihood distribution in each 
space-time bin. Apart from likelihood and number consis- 
tency tests, methods for alarm-based earthquake predictions 
are also equipped to deal with full forecast distributions [see, 
e.g., Zechar and Jordan, 2007]. 

There is a second reason for allowing forecasts to be speci- 
fied as full distributions. The idiosyncrasies of a model may 
cause consistent observations to be distributed completely 
differently than a Poisson distribution. While a certain ac- 
tual observation may not be consistent with a Poisson dis- 
tribution given the mean rate of a model forecast, the ob- 
servation may still be consistent with the model. Specifying 
the entire distribution of a forecast is computationally much 
more demanding, but it is the only way to guarantee that 
forecasts accurately reflect the scientific hypotheses of the 
model along with all sources of uncertainties involved in the 
generation of the forecast. 

The power of the "non-Poisson" L and N tests may appear 
weaker because less models are rejected, but this is simply 
a reflection of the potential stochasticity of the model and 
its real uncertainties. If the tests are adjusted to the dis- 
tribution of each forecast, then the confidence limits can be 
interpreted appropriately. 

A trivial way for a model to pass the "non-Poisson" con- 
sistency tests would be to specify extremely broad distribu- 
tions so that whatever is observed falls into the center of the 
distribution. Most clustering models are in fact already very 
broadly distributed so that this may reflect some scientific 
truth about seismicity. But in case these distributions are 
too broad, there exists also the likelihood ratio test, which 
compares models against each other and would be able to 
reject overly "dilute" forecasts against peaked ones that are 
accurate. 

Before concluding, we briefly mention two techniques that 
seem suitable for generating the entire distribution of fore- 
casts from a model. The first is a simple simulation-based 
method and is essentially an extension of the method by 
Rhoades et al. [1994] from renewal processes to clustering 
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models. The idea is to acknowledge that data and param- 
eters are uncertain and hence distributed and then to re- 
peatedly sample parameters and data randomly from these 
distributions to generate many forecasts. 

The simulation-based method is simple but computation- 
ally expensive. Furthermore, past forecasts of the model are 
"thrown away" whenever new observations become avail- 
able. A second method, data assimilation, provides an op- 
timal and more efficient solution by making use of all avail- 
able information, including previous model forecasts [see 
e.g. Kalnay, 2003] . The goal of data assimilation is to esti- 
mate the state of the physical system (and/or parameters) 
through a statistical combination of the noisy observation 
and the distributed model forecast according to Bayes' the- 
orem. The state (e.g. past earthquake data and parameters 
of the model) are sequentially updated through time by cor- 
recting the model forecast (the prior) with the observations 
(the likelihood). 

5. Conclusions 

In this article, we analyzed magnitude uncertainties and 
their impact on seismic rate estimates in short-term clus- 
tering models, on their forecasts and on their evaluation in 
predictability experiments such as RELM or CSEP. In the 
first part, we quantified magnitude uncertainties. We esti- 
mated moment magnitude uncertainties by comparing the 
estimates for the same events from the CMT and USGS 
MT catalogs. We found that a double-sided exponential 
(Laplace) distribution with a scale parameter 0.1 fit the dis- 
tribution of the estimate differences significantly better than 
a Gaussian, reflecting the higher probability of outliers. If 
the distributions of independent, individual magnitude un- 
certainties decay much more slowly than a Gaussian, they 
have at least exponential or fatter-than-exponential tails. 
We also analyzed MAD values, a measure of magnitude un- 
certainty, reported by the NCSN in its authoritative region 
of the ANSS. We found that MAD estimates below 0.1 may 
be unreliable. Typical values were between 0.1 and 0.3 but 
outliers occur often. 

Because short-term seismicity models indiscriminately 
use any listed magnitude in earthquake catalogs for seismic 
rate projections, inter-magnitude uncertainties reflect the 
true errors better. We compared the CMT moment magni- 
tudes with the PDE body and/or surface wave magnitudes 
and found scatter with standard deviations of 0.29 and 0.26, 
respectively. We further found that the NCSN local and 
coda duration magnitude estimates for the same events fit 
a Laplace distribution with scale parameter 0.2 better than 
a Gaussian (with standard deviation 0.3). 

The relative lack of available quantitative magnitude un- 
certainty estimates coupled with their importance for hy- 
pothesis testing underscore the need for increased (funding 
for) data quality assessment and control by network opera- 
tors. 

In the second part, we studied the impact of magnitude 
noise on seismic rate projections in a simple clustering model 
that captures the main ingredients of popular short-term 
models. We proved that seismic rate estimates based on 
noisy catalog data deviate from their exact rate by power law 
fluctuations in the tail with exponent a = (aVc)^ 1 , where a 
is the exponent of the productivity law of aftershocks and 
v c is the scale parameter of the Laplace distribution of the 
magnitude noise. Thus seismic rate projections and fore- 
casts can fluctuate extremely due to magnitude noise. We 
further proved that the scale factor Caa, which character- 
izes the typical scale of the fluctuations, remains a random 
variable and does not converge to a unique, fixed constant. 
Rather, there are subtle trade-offs between the power law 



tail, a tendency for the sum of random variables to con- 
verge to its stable law (Gaussian or Levy) and the strong 
quenched disorder due to particular catalog realizations and 
the stochasticity of the model. 

In the last part, we studied how forecasts based on noisy 
data would fare in RELM/CSEP predictability experiments. 
We conducted a numerical experiment in which we con- 
structed a hypothetical testing center and performed a one 
year test of daily forecasts. We assumed that earthquakes 
happen according to the seismic rate of a simple clustering 
model calibrated on an exact catalog data set. We then 
perturbed the catalog but used the same model to generate 
forecasts from the noisy data. These were submitted to the 
mock testing center as "models" that were tested for the 
consistency with the hypothetical observations. We found 
that noisy forecasts were rejected much more frequently 
than would be expected for a given confidence limit. We 
concluded that the current RELM number and likelihood 
consistency tests were sensitive to noisy forecasts and could 
wrongly reject the "true" model due to magnitude noise. 

To robustly reject models at specified confidence limit, 
tests cannot assume that observations consistent with a 
model are Poisson distributed about its mean rate fore- 
cast. To properly capture the idiosyncrasies of each model 
together with all propagating uncertainties, the forecasts 
need to specify the entire distribution for each space-time- 
magnitude bin. Based on our results that forecasts are power 
law distributed, we expect the deviations from a Poisson 
distribution to be severe. We noted that data assimilation 
techniques were particularly useful for propagating entire 
probability distributions while taking into account all un- 
certainties. 

Appendix A: Appendix 

Al. The Deviation of the Perturbed Rate from the 
True Rate as a Sum of Weighted Random Variables 

This section shows how the deviation of the perturbed 
rate due to noisy magnitudes from the true rate can be writ- 
ten as a sum over weighted random variables. 

The perturbed rate is given by 



\ p {t\H c t ) = \ c + J2 



i c \t ic <t 

while the true rate is given by 



£ e a ( m i c - m d) 

(t - U e + c)" 



(Al) 



A«(t|fl t c ) = A c + J2 



i c \ti <t 



(t - t ic + c)p 



(A2) 



where m° = m* + e and p e (e) is the distribution of the noise 
given by 



p e(e) = J_ e (-^) 



(A3) 



Hence, given any catalog realization H£ (of cluster centers), 
the deviation of the perturbed rate from the true rate is: 

A\(t\m, e) = \ p {t\m, e) - \(t\m, 6) 

k e a ( m i c - m d) fa e a ( m i c - m d) 



= E 



»o|t« c <i 



it - t ic + c)p (t - t ic + c)p 



- E 



k e 



e a ( m i r - m d) 



it - t tc + c)p (t - t ic + c)p 
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= E 

i c \t ic <t 

= E 

ic\ti,<t 



k e a(m\ c -m d ) 



(e" e ' - 1) 



Wi ■ Zi 



(A4) 



where the last equality expresses the deviation as a sum over 
a product of two terms: a quenched weight w, (i.e.. which 
is fixed for a given catalog but unknowable) 



k e a ( m i c - m d) 

(t - t ic + c)p 



(A5) 



and a random variable Zi 



Ca\- Equation (A4) shows that AA can be written as a fi- 
nite sum of weighted random variables z, where the z are dis- 
tributed according to (A7). The proof follows in two steps. 
First, we show that z is regularly varying. Second, we invoke 
the result that the sum of weighted, regularly varying vari- 
ables is equally regularly varying in the tail with the same 
exponent. We will frequently refer to the rigorous Jessen 
and Mikosch [2006] (hereafter JM), but the definitions and 
proofs can equally be found in other sources. Sornette [2004] 
provides a heuristic and intuitive development of the results 



DEFINITION 2.1 of JM: One- dimensional regularly 
varying random variables X with distribution function 
P(X > x) are defined by 



1 



(A6) P(X >x)~ q'x- a L{x) 



The weight Wi measures the influence of the ith cluster cen- 
ter according to its magnitude m* c through the productivity 
law p(rrii c ) = exp(a(m' c — m<j)) and its occurrence time ac- 
cording to the Omori-Utsu law <f>(t — U c ) = (t — U + c)~ p . 
The weights Wi thus depend sensitively on the specific cat- 
alog realization {m ic , ii }i<; c <jv c and the parameters 9. 

The weights Wi are "quenched" or "frozen" because they 
are fixed for a realization but result from a random process. 
In statistical physics of spin glasses (a similar situation be- 
cause of the frozen random variables), there are two types 
of disorder that are treated differently: quenched disorder, 
where the average is taken over the logarithm of the partition 
function; and annealed disorder, where the average is taken 
directly over the partition function. The latter case would 
correspond in our context to looking at the full distribution 
of weights, rather than assuming they are fixed. However, 
we are interested in the fluctuations of the perturbed rate 
given a fixed catalog. 

The random variables z t = (exp(ae;) — 1) modulate the 
weights due to the magnitude noise e*. Without noise, e = 
and hence Zi = so that AA = 0. Their distribution is the 
subject of the next section. 

A2. The Distribution of the Random Variables z 

Using the distribution of e from equation (1), we can de- 
termine the distribution of the zf. 



Pz(z) =Pe{t) 



1 „-e/^c 
2v c c 



2ai> c (z+1) 
1 

2av c (z+l) 

a 1 

2 ' 

a 1 

2 ' 

a/2 



2v c 

(- log(z+l)/oi/ c ) 
O (log(z+l)/ai/ c ) 



dz I ' 
de I 
dz I ' 



( + ) 
(-) 



< z < oo 
-1 < z < 

< z < oo 
-1< z < 



< e < oo 
-oo < e < 

< z < oo 

1 < z < 

where a = — !— 



and P(X < 
+ q" = l 



-x) 



q x 



*L(x) 
(A8) 



where L is a slowly varying function, i.e. L(cx) / L(c) — » 1 
as x — > oo for every c > 0. Condition (A8) is also referred 
to as a tail balance condition. The cases q' = or q" — are 
not excluded. Here and in what follows we write f(x) ~ g(x) 
as x — > oo if f(x)/g(x) — » 1. 

z is power-law distributed with exponent a for z > so 
that the slowly varying function L(z) is simply a constant. 
For z < 0, the power law is truncated at — 1, so that q" — 0. 
Hence the tail balance condition is easily verified for the pos- 
itive tail of z. Therefore, the tail of z is regularly varying 
with index a. This concludes the first part of the proof. 

To prove that the finite, weighted sum of regularly vary- 
ing variables z is also a regularly varying function with the 
same exponent, we invoke Lemma 3.3 of JM: 

LEMMA 3.3 of JM: Let (Zi) be an independently and 
identically distributed sequence of regularly varying random 
variables satisfying the tail balance condition (A8). Then 
for any real constants ipi and m > 1, 



(A7) 



P(V>iZi+ • • • + ip m Z m > x) 

m 

~ p(\z 1 \>x)j2W^tr+<i"(i'n a ] (A9) 



where ipf and ip i are defined by P(%piZi > x) = P{ipf Zf > 
x) + Zi > x) where x = V (±x) (where V means 

"or"). 

In our case, the constants ipi are given by the weights Wi. 
Plugging in q" — (i.e. q' — 1) from above and using our 
notation, we have shown that 



Figure 10 shows a double logarithmic plot of the pdf 
of the random variable z for several choices of the noise 
scale parameter v c — (0.1,0.2,0.3,0.4,0.5). We as- 
sumed a — ln(10) = 2.3 so that the exponent a = 
(4.34, 2.17, 1.45, 1.09, 0.87), respectively. 

A3. Proof of Proposition 1 

In this section, we prove that the deviation AX(t\H t ) of 
the perturbed seismic rates from the true seismic rate due 
to magnitude noise is a random variable with a distribution 
having a power law tail with exponent a and scale factor 



p(A\)~p z (z)-j2(^r 

Denoting Caa = X)f°(wi) a , we have shown that 



(A10) 



p(AA) 



(AA)!+° 



for AA -> oo (All) 



which completes the proof of Proposition 1. 
A4. Proof of Proposition 2 
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We first state well-established results for a slightly differ- 
ent definition of the scale factor (denoted by C(t), where the 
time t of evaluation is fixed) for which Proposition 2 can be 
easily proven (section A4.1). In section A4.2, we consider 
the more difficult case for our definition of the scale factor 
(C(N) where the number of cluster centers is fixed). 
A4.1. Results for Fixed-Time Scale Factor C(t) 

Recall that the scale factor is defined by: 



c AX = ]T 



^ (t-ti + cy°< 



(A12) 



When we simulate catalogs, perturb them and calculate the 
differences between the perturbed rates and the true rate, 
we are interested in the deviations from the true rate given 
a fixed number of events. We did not constrain the time t 
to a fixed value because a more useful and practical result 
would be the deviations for a fixed number of events. We 
therefore let t adjust according to when the Nth main shock 
happened. We then evaluated the rates at time t = tjv + dt 
just after the Nth main shock. Since the Nth occurrence 
time is random, t is therefore also random (for different cat- 
alogs or realizations of the Poisson process). Let us denote 
our scale factor by C(N) to stress the fact that the number 
of main shocks is fixed, not the time t. 

On the other hand, a wealth of results is available for 
the scale factor C(t), for which the time t is fixed and N 
fluctuates. The scale factor is then defined by: 



to the truncation with an exponent /3/a which lies in the 
range 1 < /3/a < 2. Therefore, the amplitudes fluctuate as 
if power-law distributed until the sampling actually "feels" 
the corner magnitudes. For specific regions of the world, 
it may take millenia for these corner magnitudes to occur. 
Therefore, while mathematically all moments of the stochas- 
tic amplitude are finite, fluctuations will be power-law like 
with infinite variance until the upper truncation is actually 
felt. 

Using results from Rice [1944] and Lowen and Teich 
[1990], we now prove the three elements of Proposition 2 
for the fixed-time scale factor C(t): 

1. First, we show that C(t) < oo almost surely (a.s.) 
even as t — > co if pa > 1. For this, we only need to 
demonstrate that all cumulants c n of C(t) given by (A15) or 
(A16) are finite. We already stated that the moments of the 
stochastic amplitudes are mathematically finite because the 
Gutenberg-Richter distribution is truncated. Furthermore, 
by definition 8 — pa > 1 > 1/nVn, and A — c > 0. There- 
fore, c n < oo V n, which in turn implies that C(t) < oo a.s. 
V t. 

2. Second, we show that C(t) diverges a.s. as t — > oo if 
pa < 1. To prove this, we will bound C(t) from below and 
show that this lower bound diverges a.s. As noted above, 
only magnitudes down to rrid are included in the process. 
Therefore, the smallest productivity is given by k a . We cre- 
ate a lower bound for C(t) by replacing all productivities by 
their lower bound k a , i.e. 



(A13) 



N c (t) 

C(t) > D(t) = Yl 



(t-ti + c)p 



(A17) 



where N(t) is a now random variable for fixed t. Now we 
make the crucial identification of the scale factor C(t) as 
the intensity of power law shot noise [e.g. Lowen and Teich, 
1990], defined in general by 



Now the process D(t) has moment generating function Q(s) 
that is equal to equation (Al) in Lowen and Teich [1990, Ap- 
pendix A]. There, the authors show that, for 8 < 1, Q(s) = 1 
for s = and zero otherwise, so that 



I(t)= E K i h (t-tj) (A14) 
i\tj<t 

where I(t) is the "current" or noise, tj are Poisson occur- 
rence times with rate A < oo, the Kj are i.i.d. stochastic 
amplitudes and the "impulse" function h(t) = t~ 5 is an 
inverse power law function on the interval [A, B] and zero 
otherwise. These correspond exactly to the fixed-time scale 
factor C(t), the main shock occurrence times at rate A c , the 
productivities k a exp(— aa(mj — mj)) and the Omori-likc 
decay (t — tj + c)~ pa , respectively. Note that in our case 
A — c, B = oo and 8 — ap. 

The cumulants c n of I(t), which determine the moments 
of the shot noise, are given by the following equations [Rice, 
1945]. For 8^ 1/n: 

a 1 — n<5 T>l — n5 

c n = \{K n ) x n& ° x (A15) 

while for 8 — 1/n: 

c n = \{K n ) x ]n{B/A) (A16) 

The nth cumulant is hence infinite if the nth moment (K n ) 
of the stochastic amplitudes is infinite, if A = and 8 > 1/n, 
or if B = oo and 8 < 1/n. 

Since earthquakes cannot have infinite moment (magni- 
tude), their distribution is truncated and hence all moments 
of the stochastic amplitudes are finite: (K n ) < oo Vn. 
However, the productivities are power law distributed up 



Pr{D t < x} = 0, for all x < oo 

which proves the a.s. divergence of C(i) as t — » oo. 

3. Third, in the regime pa > 1 for which C(t) < oo al- 
most surely, we show that C(t) remains a random variable 
with a non-degenerate distribution. For this, we only need 
to state that the variance of C(t) is non-zero as t — > oo. We 
can actually calculate the variance explicitly, being equal to 
the second cumulant C2 given by equation (A15): 

Var(Ct) = ,„ Xc{ f*l T (A18) 

v ' (2ap - 1)c 2 «p- 1 

As stated above, the second moment of the amplitude is 
mathematically finite. However, if the earthquake catalog 
under study does not actually sample the upper magnitude 
cut-off, the variance will behave as if infinite. Thus, not only 
is C(t) a random variable, it fluctuates wildly. Sampling the 
corner magnitude may take hundreds to thousands of years 
even in relatively active regions like California. To get a 
sense of the numbers, set A c = 0.01, {K } = 1 and ap — 2 
for simplicity, neglecting for a moment the large second mo- 
ment of K. For these values, the variance is on the order 
of 10 6 , far larger than typical earthquake rates of e.g. 1 per 
day above nrid — 4 in California. This completes the proof 
of Proposition 2 for the fixed-time scale factor C(t). 

In fact, more results are known about the statistical prop- 
erties of C(t) [e.g. Lowen and Teich, 1990, Figure 3, and ref- 
erences therein] . If A > and 8 > 1 so that cumulants exist 
(assuming the stochastic amplitudes have finite moments), 



X - 16 



WERNER AND SORNETTE: MAGNITUDE UNCERTAINTIES AND FORECASTS 



then in the limit of infinite Poisson driving rate A — > oo, the 
intensity C(t) is distributed according to a Gaussian with 
mean equal to the first cumulant and variance equal to the 
second cumulant. Even in this limit (which is not directly 
relevant to earthquakes since there A is small) the variance 
remains huge. Furthermore, if A = and 8 > 1, the dis- 
tribution of C(t) is Levy-stable for all Poisson rates with 
exponent 1/8. Since for earthquakes, A = c is very small, 
we expect the distribution of C(t) to be close to Levy with 
exponent 1/ap = av c /p. For reasonable values a — 2.3, 
v c = 0.2 and p = 1.2, this results in an extremely small 
Levy exponent 0.4. 

A4.2. Results for Fixed-Number Scale Factor C(N) 

Results do not seem widely established for the fixed- 
number scale factor. Note, however, that both the mean 
number and the variance of the number of events in a Pois- 
son process diverge as t — > oo. The higher moments of 
degree n of the Poisson process are Touchard polynomials 
of degree n of the variable Xt. For t — > oo, they also di- 
verge. Furthermore, the probability of having a finite num- 
ber k of events in an infinite interval Pr{ A(0, T] = k} — 
(XT) k exp(— \T)/k\ is zero for T — > oo. And vice versa, the 
probability of having an infinite number of events k = oo 
in a finite interval T is equally zero. Therefore, the limit 
t — > oo and N — -> oo are equivalent so that we expect the 
same results to hold for C(N) as for C(t). Without a more 
formal statement of this equivalence, however, we proceed 
to consider separately C(N) as A — > oo. 

Let us rewrite expression (16) for C(N) explicitly as 



_ N _ aa(m\-m d ) 

C (N) = y^ — . 



(A19) 



where c' = c + dt is a constant since t = ijv + dt. We bound 
C(N) from below and from above by noting that 

k a < k a e aa{m ^- md) < k a e aa(M - md) . (A20) 



since m d < m* < M, where M is an upper magnitude 
bound, which always exists due to the finiteness of the Earth. 
Thus, 



k a Y l ^— < C(N) < k a e aa(M - md) V — 

jri (t N -U + c'Y a ~ {r{ {t, 



1 

ti+C 



by (N - i)r„ 



C(N)>k a J2 



+ c')p° 



(A23) 



for which the right-hand-side diverges for pa < 1 so that 
C(N) diverges. In the other part (i), we show that the expec- 
tation of Caa diverges using Jensen's inequality [e.g. Dur- 
rett, 2005]. Jensen's inequality theorem states that for any 
convex function g(x) (i.e., with non-negative second deriva- 
tive g" (x) > 0, V x, if the second derivative exists) and for 
any random variable £ with finite expectation, the following 
inequality holds true: 



EfcKO] > S(E[£]), 



(A24) 



where E[x] denotes the expectation of the random variable 
x. The equality sign holds true in (A24) only for a degen- 
erate distribution of £. Now we use g(x) = l/x ap , having 
checked that its second derivative is positive g"(x) > 0, and 
let the random variable be £ = t — U + c. Using Jensen's 
inequality: 



E 



1 



_(t-U + c)"f 



(E[t-U + c]) a P' 



(A25) 



Now if the ti's are assumed to be a stationary sequence 
(not even necessarily Poissonian), then E[t — ti + c] = 
const ■ (N — i) + c. Using this argument on each term in 
the sum leads to 



E 



^ {t - u + c y 



N i 

~ ~i (const ■ (N - j) + c) 



#26) 



But the term on the right hand side diverges — > oo as 
N — > oo for ap < 1 so that the expectation of the scale factor 
also diverges. This proves the second element of Proposition 
2. 

3. Finally, in the regime pa > 1 for which C(N) converges 
almost surely to a finite value as N — > oo, we show that 
r^P(N) remains a random variable dependent on the specific 
catalog distributed according to a non-degenerate distribu- 
(A21) tion. We rewrite (A19) as 



1. First, we show that, when pa > 1, the scale factor Caa 
converges to a finite value as N c —> oo under the assump- 
tion of an arbitrary small but finite minimum time interval 
< T m i„ << 1/A between events. In this case, we can 
further bound the right-hand-side of equation (A20) from 
above by replacing the intervals tN — U by (N — i)T m i n : 

n c 

C(N) < k - e ^(M-m d ) y i — (A22) 

V jr( ((A - l) ■ Train + C)P^ ' 

The sum in (A22) is in turn bounded from above by the 
Riemann zeta function C( a P) = Vi" P i which converges 
absolutely for ap > 1. This completes the proof that the 
scale factor Caa converges to a finite value as N c — > oo if 
pa > 1. 

2. Second, we show, when pa < 1, that (i) the expec- 
tation of the scale factor C(N) diverges as N c — > oo us- 
ing Jensen's inequality and (ii) C(N) diverges under the 
assumption of an arbitrarily large but finite maximum in- 
terval Tmax between events. In the latter case (ii), we can 
bound C(N) from below by replacing the intervals tN — U 



N 

C{N) = Y,UiXi, (A27) 

where Ui = l/(t - U + c) pa and X, = fc« e «<*K->"d). 
The scale factor C(N) is rewritten in (A27) as a randomly 
weighted sum of i.i.d. random variables Xi, where the ran- 
dom weights are functions of the random occurrence times 
and the random variables are the magnitude-dependent pro- 
ductivities. The weights are non-identically distributed and 
dependent while the Xi are i.i.d. We will show that for 
any fixed configuration of occurrence times and random Xi, 
C(N) remains distributed. For pa > 1, we have shown in 
1. that Wn = SiLi < oo for JV ^ oo. We then use 
the result quoted from Jamison et al. [1965]: "[if the sum 
Wn of the weights converges, then] C(N) /Wn [the normal- 
ized weighted sum] either fails to converge in probability 
or converges almost surely to a non-degenerate limit." In 
plain words, in the latter case, this means that the ran- 
dom variable S n /W n remains distributed according to a 
non-degenerate probability distribution, even in the limit 
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N — > oo. Thus, in both cases, the variance of the scale fac- 
tor remains non-zero. The intuition behind this result is that 
the convergence of the weights ensures that there are only 
a finite number of terms in the infinite sum that contribute 
to it. This implies that, notwithstanding the existence of an 
infinite number of contributions, the sum remains a random 
variable controlled by a finite number of them. This com- 
pletes the proof that the scale factor does not converge to 
a unique constant (a degenerate limit) when the exponent 
p/(a ■ v c ) > 1. 
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Table 1. Results of the mock RELM/CSEP experiment 
of daily forecasts over the period of one year for v c = 0.0: 
We checked that "models" are not rejected by the tests when 
the data is exact and no noise is present. "Models" are fore- 
casts generated from equation (7) using a noisy cluster center 
process which was perturbed from the original one by adding 
random noise of scale v c to the magnitudes. The first col- 
umn contains different perturbations of the original catalog 
corresponding to different forecasts or "models" (which, for 
v c = 0.0, are all equal). The second column is the total ex- 
pected number of events obtained by summing all daily fore- 
casts over all spatial bins over the one year period. The ex- 
pected number of events of the original catalog was 143.61. 
The third column shows the fraction {7) of the m simulated 
likelihoods less than the observed likelihood, averaged over all 
s modified observations. The fourth column shows the stan- 
dard deviation of the 7 values. The fifth column shows the 
fraction (<5) of the m simulated numbers of events less than 
the observed number of events, averaged over all s modified 
observations. The sixth column shows the standard deviation 
of the S values. 
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Table 2. Same as Table 1 but now perturbing the original 
catalog from which forecasts ("models") are generated by in- 
troducing noise v c = 0.1. The total expected number of events 
based on the exact forecast was E(N) = 136.31. The L test 
does not reject any models while the N test rejects 2 models. 
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Table 3. Same as Table 2 but now perturbing with stronger 
noise v c = 0.2. The total expected number of events based 
on the exact forecast was E(N) = 95.81. The L test does not 
reject any models while the N test rejects 9 models. 
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Table 4. Same as Table 2 but now perturbing with stronger 
noise v c = 0.3. The total expected number of events based on 
the exact forecast was E(N) = 186.65. The L test rejects 1 
model while the N test rejects 7 models. 
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Table 5. Same as Table 2 but now perturbing with stronger 
noise v a = 0.5. The total expected number of events based 
on the exact forecast was E(N) = 92.15. The L test does not 
reject any models while the N test rejects all 10 models. 
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Figure 1. Estimating intra-magnitude uncertainty by 
comparing moment magnitude estimates for the same 
event from the Harvard CMT and the USGS MT cat- 
alogs, a) Fixed kernel density estimate (solid) of the 
probability density function of the differences in moment 
magnitudes and maximum likelihood fit (dashed) of a 
Laplace (double-sided exponential) distribution given by 
equation (1) with scale parameter v c = 0.07. b) Same 
as a) but in semi-logarithmic scale, c) Semi-logarithmic 
plot of the survivor function (complementary cumulative 
distribution function), d) Semi-logarithmic plot of the 
cumulative distribution function. 
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Figure 2. Estimates of the e-folding scale parameter v c 
of the Laplace (double-sided exponential) distribution of 
equation (1) as a function of the threshold above which 
the data is fit. v c increases from 0.07 to about 0.1 due to 
fatter-than-exponential tails before starting to fluctuate 
more strongly due to finite sample effects. Error bars 
show 95% confidence intervals. 
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Figure 3. Median Absolute Differences (MAD) versus 
their associated coda duration magnitudes as reported 
by the NCSN in its authoritative region of the ANSS 
composite catalog. MAD values measure the variability 
of the magnitude estimates for the same event from dif- 
ferent stations as computed from the HYPOINVERSE 
program of the USGS. 
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Figure 4. Median Absolute Differences (MAD) reported 
by the NCSN in its authoritative region of the ANSS com- 
posite catalog. Top: kernel density estimate of the prob- 
ability density function (pdf). Middle: cumulative distri- 
bution function (CDF). Bottom: survivor function plot- 
ted on logarithmic axes. The dashed line at MAD=0.59 
corresponds to the 99th percentile of the distribution. 
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Figure 5. Median Absolute Differences (MAD) reported 
by the NCSN in its authoritative region of the ANSS 
composite catalog as a function of the number of stations 
involved in the computation of the magnitude and the 
MAD value. 
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Figure 6. Estimating infer-magnitude uncertainties by 
comparing the CMT moment magnitude Mw with its 
corresponding body wave magnitude mj (a) and surface 
wave magnitude Ms (b) from the PDE catalog. 
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Figure 7. Kernel density estimates of the probability 
density functions of the differences between the Harvard 
CMT moment magnitudes Mw and their corresponding 
body wave m& (dashed) and surface wave Ms (solid) mag- 
nitude estimates from the PDE. The means of the data 
are 0.26 for nib and 0.42 for Ms- The standard deviations 
are about 0.29 for irif, and 0.26 for Ms- 
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Figure 8. Differences between the coda duration magni- 
tude Mo and the local amplitude magnitude Ml versus 
Md in the NCSN catalog whenever at least one is larger 
than 3. 
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Figure 9. Kernel density estimate (solid) of the prob- 
ability density function of the differences A between the 
duration magnitude Mo and the local magnitude Ml re- 
ported by the NCSN. Also shown are a Gaussian fit with 
mean —0.0153 and standard deviation 0.3 (dashed); and 
a fit to a Laplace pdf with median —0.04 and e-folding 
scale parameter 0.2 (dotted). 
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Random Variable z 



Figure 10. Theoretical and simulated probability den- 
sity function (pdf) of the random variable z = (exp(a • 
e) — 1) for various choices of the scale parameter of the 
noise v c and assuming a — 2.3. The curves are shifted 
for clarity. 
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Figure 11. Simulated probability density functions of 
the "perturbed" seismic rates due to noisy magnitudes 
shown as a percent deviation from the true rate for noise 
levels v c = (0.1,0.2,0.3,0.5). We calculate seismic rates 
at a time lag dt after the 100th cluster center event. From 
top to bottom, dt — (0.0001, 0.1, 1, 10). Seismic rate esti- 
mates (and hence forecasts) are wildly distributed around 
the true value (solid vertical lines). 
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Deviation Percentage of Perturbed Rates from True Rate [in percent] 

Figure 12. Survivor functions of the simulated "per- 
turbed" seismic rate estimates (and forecasts) shown as 
a deviation in percent from the simulated "true" rate 
for noise levels v c = (0.1,0.2,0.3,0.5) and time lag since 
the last event in the process dt — 0.0001. The survivor 
functions are approximately parallel to the straight lines 
which are guides to the eye with theoretically predicted 
exponents given by a — (4.34,2.17,1.45,0.87), respec- 
tively. Even for small noise level v c = 0.2, 10 percent of 
the rate estimates over-predict the rate by 100 percent! 
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Figure 13. Same as Figure 12 except that perturbed 
and true rates are evaluated at dt = 10: log-log plots 
of the survivor functions of the simulated "perturbed" 
seismic rate estimates (and forecasts) shown as a devia- 
tion in percent from the simulated "true" rate for noise 
levels v c — (0.1,0.2,0.3,0.5). The slopes of the straight 
lines are given by the asymptotic predicted exponents 
a = (4.34,2.17, 1.45,0.87), respectively. 
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Mock RELM/CSEP Forecast Evaluation Experiment 



Choose model equation (7) 
Choose model parameters 



Simulate ^ 



Choose space-time test area: 
365 days = 1 year period 
700 km by 700 km 



^Space-time cluster center catalog (e.g. Figure 1 5)^ 



Exact 
catalog 



I 

Use model equation (7) 
■ 

Daily time bins: 365 
spatial bins: 30 by 30 



f Exact forecast | ■ 
^ . Reality" J ' 



I 

f Noisy j 
(catalog 1 J 

-1 



Perturb 



Calculate 



Choose noise level V, 
to be added to each 
magnitude 



| Noisy 
^ M 




forecast 1 
Model" 1 



f Noisy forecast j 
^ „Moder'j 



[Noisy forecast M 
„Model"M 




/ 



CSEP TESTING CENTER 

Sample I 



.Sim.obs." 
set 1 



4 



LSim.obs." j 
L set k J 

I 



LSim.obs." j 
^ set m J 

i 



\ 



For each sim. obs. set k=1 ...m calculate: 

(i) likelihood of sim. obs. set assuming ..model" j correct, 

(ii) total number of events of each sim.obs. set 
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Calculate obs. likelihood assuming model j and 
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N TEST: 
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Figure 14. Diagram explaining the numerical experi- 
ment designed to study the influence of magnitude noise 
on forecasts and their evaluation in daily earthquake fore- 
cast experiments such as RELM or CSEP. We mimic Cal- 
ifornia both in spatial test area and in model parameters 
and perform the likelihood (L) and number (N) test for 
daily forecasts over the period of one year using a spatio- 
temporal Poisson cluster center model that captures es- 
sential ingredients of most short-term seismicity models. 
The abbreviations mod., sim. and obs. stand for modi- 
fied, simulated and observations, respectively. 
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Figure 15. Example of a simulated cluster center cat- 
alog and resulting conditional intensity including after- 
shocks. Top: spatial distribution. Middle: Magnitudes 
of cluster centers against time over the course of one year. 
Bottom: Conditional intensity rate per day calculated us- 
ing the model equation (7) but summed over all bins. The 
rate is used as the "exact forecast" from which modified 
observations are generated. 



